[Xapian-discuss] Filtering queries with many boolean terms

Olly Betts olly at survex.com
Thu Oct 8 01:07:43 BST 2009


On Wed, Oct 07, 2009 at 02:06:49PM -0400, Jason Tackaberry wrote:
> On Tue, 2009-10-06 at 11:00 +0100, Olly Betts wrote:
> > Currently we rewrite all the terms if any are changed,
> 
> Is most of that rewriting work done at flush time?  Because even
> excluding flush, currently updating 1000 documents takes about 3 seconds
> (on my system with a 200k document database).  That's still quite a bit
> slower than I'd like.  Do you anticipate any performance improvement
> with replace_document() as well, or just flush()?

Unnecessary extra work happens both when replace_document() is called,
and also when flush() is (or when it happens implicitly).

> > But this looks odd to me.  When you say "query time increases", what
> > are you actually timing here?  If it includes the query parsing time
> > then I suspect the quadratic behaviour is probably there.
> 
> You're right on the money here.  I was measuring parse time as well.  I
> had the flawed intuition that parse time should be negligible compared
> to search time so ended up including it in my timing calculation.

(On the second attempt) I seem to have managed to write a testcase for
this scaling problem.  I'll investigate what's causing it.

Cheers,
    Olly



More information about the Xapian-discuss mailing list