[Xapian-discuss] problem with multi-database search, xapian 0.9.10

Olly Betts olly at survex.com
Mon Feb 18 14:42:36 GMT 2008


On Mon, Feb 18, 2008 at 02:19:09PM +0600, Vasiliy Sergeev wrote:
> It seems for me that xapian decided to start January and February DBs 
> from some very close to MAX_INT value.

I've never seen that happen before, assuming you're not using
replace_document() to set the document ids.  If you do that, then
the lowest unused document id will be used when you don't specify.

Is this repeatable?

> Is there any possible solution to 
> shift them?
> Can xapian utilities do such thing?

By default, xapian-compact will shift docids down to remove any "gap" at
the start of the numbering.  It doesn't remove gaps between used
document ids though (that can't be done so cheaply).

> OR maybe there is a way for me to set first docid to 1.

It should be by default.

> > Although you can set your own docids to create a sparse usage pattern,
> > it's probably not a good idea to.  The backend uses delta encoding on
> > docids to compress posting lists, which means that the compression won't
> > be as good.  You'll probably waste more space than you save by not
> > storing the UID as a term, and less compressed posting lists affect all
> > searches whereas the UID terms won't have much overhead at search time
> > (since they'll all appear adjacently in the Btree).
>
> Could you please explain this part?

I assumed you were setting the document ids explicitly, but it sounds
like you aren't.

Cheers,
    Olly



More information about the Xapian-discuss mailing list