[Xapian-devel] Something to think about

Olly Betts olly at survex.com
Sun Oct 14 15:25:20 BST 2007


On Sun, Oct 14, 2007 at 02:22:41PM +0100, James Aylett wrote:
> But with any method for interleaving or sequentially assigning docids
> out of multiple dbs, they won't change unless one or more of those dbs
> is changed between searches.

Umm, it's common for an index to be constantly fed with new documents.
If I mark document 123456 as relevant and hit "search" again, several
new documents will have arrived in those few seconds, so document 123456
is now a different document and the relevance feedback will give bogus
results.

> So I still think it's an edge case. I think there's something I'm
> missing here...

I don't see how that's an "edge case" - if you're constantly adding new
documents, it'll happen all the time.

> > And for things like marking documents for relevance feedback, stability
> > of docids between searches is pretty much essential, which is why
> > we originally chose the interleaving scheme.
> 
> Umm, okay. So we need to think about how to assign the gap to retain
> that for a little longer. That's harder, but could be a hint on
> multi-db open or something. Not ideal, though :-(

If we support something like this, I think it would have to be down to
the user to manage in their own code.  After all, only they will know
what they want to do if the "buffer zone" proves too small...

But a fairly common design is for only one database to be being updated
(and periodically older databases are merged, or the oldest one is
just dropped entirely).  In that case, you just need to make sure that
the "live" database is last and numbering sequentially will work.
Sequential numbering also works fine if you don't care about stable
docids in your application.  So I think perhaps it's best to keep it
simple and point people to the interleaved approach if they want
stable docids in multi-databases.

Cheers,
    Olly



More information about the Xapian-devel mailing list