[Xapian-devel] docid type redifine
olly at survex.com
Fri Aug 19 16:02:08 BST 2005
On Fri, Aug 19, 2005 at 06:15:35PM +0400, Pronichev Alexander wrote:
> Sorry for my english, I'm not a native speaker.. ;-)
Don't be, it's excellent.
> I am indexing objects from relational sql db (by object id). The point
> is that I have several daemons, which provide object ids from
> different ranges (6 bytes total) at the same time and I can ask any of
> them (randomly). That's why I can have objects with so large ids. But
> sure there are much less than 4 billion documents simultaneously in
> one xapian db.
In this case, I'd recommend using a unique term to hold the object id
instead of trying to sync Xapian's docids with the object ids. So
document from object id 1234 might be indexed by U1234.
This adds a small overhead per document, but then sparse docids you have
by using the object id as the docid will work against the compression
schemes used so letting Xapian allocate non-sparse docids will result in
a smaller Xapian database.
You're also adding overhead by forcing use of 64 bit arithmetic inside
> > However, I think you'll need to change both termcount and doccount
> > anyway since otherwise the collection frequency (which is returned as
> > a termcount) could overflow. The problem is that the collection
> > frequency is effectively doccount * termcount so needs to be the larger
> > of the two types (or ideally a type large enough to hold the product of
> > the two) - currently it's returned as termcount.
> Yes. Finally I've patched it like this. But it doesn't work. Documents
> successfully added to db (at least doccount() method returns correct
> value), but when I try to search these documents mset iterator size is
> 0. When I'm indexing documents with id smaller than unsigned int it
> works correctly.. I'm using Perl API.
It sounds like something in Xapian isn't using doccount/docid/termcount
when it should be (and is using unsigned int or similar directly), or
perhaps the Perl XS glue isn't handling passing 64 bit integers from
Perl to Xapian properly.
Can you produce a small self-contained example which shows the problem?
In Perl is fine.
More information about the Xapian-devel