[Xapian-tickets] [Xapian] #385: Expanding docids (etc) beyond 32 bit types
Xapian
nobody at xapian.org
Thu Jun 25 16:46:33 BST 2009
#385: Expanding docids (etc) beyond 32 bit types
-------------------------+--------------------------------------------------
Reporter: james | Owner: olly
Type: enhancement | Status: new
Priority: normal | Milestone:
Component: Other | Version: SVN trunk
Severity: minor | Keywords:
Blockedby: | Platform: All
Blocking: |
-------------------------+--------------------------------------------------
Comment(by james):
Replying to [comment:2 olly]:
> I'm not keen on adding new public types for this. They aren't useful in
> themselves, and these types don't all need to be the same size - they're
> just all "int" currently as that's "big enough" (or at least were for
> most people) on all modern plaforms - so a common type for them doesn't
> really make logical sense either.
Yes, that makes sense. It just seemed a convenient place to shove it so I
could play around; I wasn't actually expecting so many of the tests to
pass without tweaking.
So a patch that just changes the typedefs directly without an intermediate
type is appropriate? (I guess a define from config.h?)
> If valueno needs changing, that's a bug.
That's what I thought, but I don't really know what's going on in detail
enough to figure it out. The error is:
./common/valuelist.h:73: warning: ‘virtual void
Xapian::ValueIterator::Internal::skip_to(Xapian::docid)’ was hidden
api/documentvaluelist.h:59: warning: by ‘void
DocumentValueList::skip_to(Xapian::valueno)’
When Xapian::docid and Xapian::valueno are typedef'd the same, this
doesn't matter.
> Ideally termcount shouldn't need
> changing (more than 4 billion terms per document doesn't seem like a
sane
> scenario, and isn't going to work sanely with the current termlist
storage
> anyway), but we would need a new type for collection frequency. We
probably
> should have one anyway since the collection frequency of a term which
occurs
> many times in many documents will for many users probably overflow 32
bits
> before you add 4 billion documents.
Yes, that all makes sense. I was probably overly-liberal in the types I
changed, bitten by the valueno thing and not bothering to fix it properly.
There's another problem if I don't switch termcount:
backends/chert/chert_postlist.cc:1137: error: cannot convert
‘Xapian::termcount*’ to ‘Xapian::doccount*’ for argument ‘3’ to
‘Xapian::docid read_start_of_first_chunk(const char**, const char*,
Xapian::doccount*, Xapian::termcount*)’
And if termpos and termcount don't match:
api/../backends/multi/multi_termlist.h:51: error: conflicting return type
specified for ‘virtual Xapian::termpos MultiTermList::positionlist_count()
const’
./common/termlist.h:101: error: overriding ‘virtual Xapian::termcount
Xapian::TermIterator::Internal::positionlist_count() const’
> A stub database can refer to any database backend(s), and there are
several
> stub tests which are run over various actual backends, but if the remote
> tests fail, then stub tests run over the remote backend are likely to as
well!
Yes, that sounds like what happened :-) It's stubdb2, which is just
`remote :../bin/xapian-progsrv .chert/db=apitest_simpledata`.
> The remote protocol uses variable length integer encodings produced by
> templated functions, so I'd actually expect it would just work. Hard to
> guess what might be wrong. Running xapian-tcpsrv by hand in one
terminal
> and performing a search on it from another (e.g. via a stub db file and
> examples/quest) might show what's going on.
Server error:
Got exception NetworkError: Bad encoded length: insufficient data
I've tried investigating further, but my gdb-fu isn't really strong enough
when forking starts getting involved :-/
> Right now, the sanest approach is probably just for people
> who actually need it to enable it - if you're handling more than 4
billion
> documents, having to work with a specially built package isn't likely to
> be a huge deal.
Yes, that's what I thought. I was thinking that keeping a patch that
applying reasonably cleanly around would mean that people with that
requirement could build it 64 bit and see what happens. Once we get a bit
of practical feedback of its actually working, it could be built in
properly as a compile option.
--
Ticket URL: <http://trac.xapian.org/ticket/385#comment:3>
Xapian <http://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list