[Xapian-tickets] [Xapian] #385: Expanding docids (etc) beyond 32 bit types

Xapian nobody at xapian.org
Thu Jun 25 16:46:33 BST 2009


#385: Expanding docids (etc) beyond 32 bit types
-------------------------+--------------------------------------------------
 Reporter:  james        |       Owner:  olly     
     Type:  enhancement  |      Status:  new      
 Priority:  normal       |   Milestone:           
Component:  Other        |     Version:  SVN trunk
 Severity:  minor        |    Keywords:           
Blockedby:               |    Platform:  All      
 Blocking:               |  
-------------------------+--------------------------------------------------

Comment(by james):

 Replying to [comment:2 olly]:

 > I'm not keen on adding new public types for this.  They aren't useful in
 > themselves, and these types don't all need to be the same size - they're
 > just all "int" currently as that's "big enough" (or at least were for
 > most people) on all modern plaforms - so a common type for them doesn't
 > really make logical sense either.

 Yes, that makes sense. It just seemed a convenient place to shove it so I
 could play around; I wasn't actually expecting so many of the tests to
 pass without tweaking.

 So a patch that just changes the typedefs directly without an intermediate
 type is appropriate? (I guess a define from config.h?)

 > If valueno needs changing, that's a bug.

 That's what I thought, but I don't really know what's going on in detail
 enough to figure it out. The error is:

 ./common/valuelist.h:73: warning: ‘virtual void
 Xapian::ValueIterator::Internal::skip_to(Xapian::docid)’ was hidden
 api/documentvaluelist.h:59: warning:   by ‘void
 DocumentValueList::skip_to(Xapian::valueno)’

 When Xapian::docid and Xapian::valueno are typedef'd the same, this
 doesn't matter.

 > Ideally termcount shouldn't need
 > changing (more than 4 billion terms per document doesn't seem like a
 sane
 > scenario, and isn't going to work sanely with the current termlist
 storage
 > anyway), but we would need a new type for collection frequency.  We
 probably
 > should have one anyway since the collection frequency of a term which
 occurs
 > many times in many documents will for many users probably overflow 32
 bits
 > before you add 4 billion documents.

 Yes, that all makes sense. I was probably overly-liberal in the types I
 changed, bitten by the valueno thing and not bothering to fix it properly.

 There's another problem if I don't switch termcount:

 backends/chert/chert_postlist.cc:1137: error: cannot convert
 ‘Xapian::termcount*’ to ‘Xapian::doccount*’ for argument ‘3’ to
 ‘Xapian::docid read_start_of_first_chunk(const char**, const char*,
 Xapian::doccount*, Xapian::termcount*)’

 And if termpos and termcount don't match:

 api/../backends/multi/multi_termlist.h:51: error: conflicting return type
 specified for ‘virtual Xapian::termpos MultiTermList::positionlist_count()
 const’
 ./common/termlist.h:101: error:   overriding ‘virtual Xapian::termcount
 Xapian::TermIterator::Internal::positionlist_count() const’

 > A stub database can refer to any database backend(s), and there are
 several
 > stub tests which are run over various actual backends, but if the remote
 > tests fail, then stub tests run over the remote backend are likely to as
 well!

 Yes, that sounds like what happened :-) It's stubdb2, which is just
 `remote :../bin/xapian-progsrv .chert/db=apitest_simpledata`.

 > The remote protocol uses variable length integer encodings produced by
 > templated functions, so I'd actually expect it would just work.  Hard to
 > guess what might be wrong.  Running xapian-tcpsrv by hand in one
 terminal
 > and performing a search on it from another (e.g. via a stub db file and
 > examples/quest) might show what's going on.

 Server error:

 Got exception NetworkError: Bad encoded length: insufficient data

 I've tried investigating further, but my gdb-fu isn't really strong enough
 when forking starts getting involved :-/

 > Right now, the sanest approach is probably just for people
 > who actually need it to enable it - if you're handling more than 4
 billion
 > documents, having to work with a specially built package isn't likely to
 > be a huge deal.

 Yes, that's what I thought. I was thinking that keeping a patch that
 applying reasonably cleanly around would mean that people with that
 requirement could build it 64 bit and see what happens. Once we get a bit
 of practical feedback of its actually working, it could be built in
 properly as a compile option.

-- 
Ticket URL: <http://trac.xapian.org/ticket/385#comment:3>
Xapian <http://xapian.org/>
Xapian



More information about the Xapian-tickets mailing list