[Xapian-discuss] problem with multi-database search, xapian 0.9.10

Olly Betts olly at survex.com
Sat Feb 16 20:31:31 GMT 2008


On Tue, Feb 12, 2008 at 07:38:24PM +0600, Vasiliy Sergeev wrote:
> I suspect 32bit overflowing. The same search in single DB results with 
> document ids like 4236476938 which is strangely close to the MAX of 
> 32bit integer.

Yes, the multidb code doesn't currently check for the mapped docid
wrapping round.

> I plan to migrate to xapian 1.0.5 but I want to know why this problem 
> appears to be sure it won't happen with latest xapian version.

This is unchanged in 1.0.5.

If you make use of more than half the docid space in each of the two
subdatabases, there's not much we can do.  We need to map the docids
from the subdatabases to/from the docids of the combined database.  So
the "fix" would be to throw an exception in this case, which isn't going
to help you much...

I assume you don't actually have 4 billion documents in each database?
If you do, then your only option is to recompile Xapian with a 64-bit
Xapian::docid type.

Although you can set your own docids to create a sparse usage pattern,
it's probably not a good idea to.  The backend uses delta encoding on
docids to compress posting lists, which means that the compression won't
be as good.  You'll probably waste more space than you save by not
storing the UID as a term, and less compressed posting lists affect all
searches whereas the UID terms won't have much overhead at search time
(since they'll all appear adjacently in the Btree).

Cheers,
    Olly



More information about the Xapian-discuss mailing list