[Xapian-discuss] problem with multi-database search, xapian 0.9.10
Olly Betts
olly at survex.com
Mon Feb 18 14:42:36 GMT 2008
On Mon, Feb 18, 2008 at 02:19:09PM +0600, Vasiliy Sergeev wrote:
> It seems for me that xapian decided to start January and February DBs
> from some very close to MAX_INT value.
I've never seen that happen before, assuming you're not using
replace_document() to set the document ids. If you do that, then
the lowest unused document id will be used when you don't specify.
Is this repeatable?
> Is there any possible solution to
> shift them?
> Can xapian utilities do such thing?
By default, xapian-compact will shift docids down to remove any "gap" at
the start of the numbering. It doesn't remove gaps between used
document ids though (that can't be done so cheaply).
> OR maybe there is a way for me to set first docid to 1.
It should be by default.
> > Although you can set your own docids to create a sparse usage pattern,
> > it's probably not a good idea to. The backend uses delta encoding on
> > docids to compress posting lists, which means that the compression won't
> > be as good. You'll probably waste more space than you save by not
> > storing the UID as a term, and less compressed posting lists affect all
> > searches whereas the UID terms won't have much overhead at search time
> > (since they'll all appear adjacently in the Btree).
>
> Could you please explain this part?
I assumed you were setting the document ids explicitly, but it sounds
like you aren't.
Cheers,
Olly
More information about the Xapian-discuss
mailing list