[Xapian-discuss] Filtering queries with many boolean terms

Jason Tackaberry tack at urandom.ca
Wed Oct 7 19:06:49 BST 2009


Hi Olly,

On Tue, 2009-10-06 at 11:00 +0100, Olly Betts wrote:
> Currently we rewrite all the terms if any are changed,
[...]
> So one approach would just be to live with the 40 seconds for now and
> you'll get a nice speed up when that optimisation is implemented.

Is most of that rewriting work done at flush time?  Because even
excluding flush, currently updating 1000 documents takes about 3 seconds
(on my system with a 200k document database).  That's still quite a bit
slower than I'd like.  Do you anticipate any performance improvement
with replace_document() as well, or just flush()?


> But this looks odd to me.  When you say "query time increases", what
> are you actually timing here?  If it includes the query parsing time
> then I suspect the quadratic behaviour is probably there.

You're right on the money here.  I was measuring parse time as well.  I
had the flawed intuition that parse time should be negligible compared
to search time so ended up including it in my timing calculation.

Indeed, when I construct the Query programmatically as you suggested,
the performance win is significant.  The whole approach looks quite
feasible now, and your explanation makes good sense.  Thank you.

Cheers,
Jason.




More information about the Xapian-discuss mailing list