[Xapian-discuss] Reasonable Time Expectation for Long Queries?

Olly Betts olly at survex.com
Thu Apr 12 22:27:19 BST 2007


On Thu, Apr 12, 2007 at 10:10:40AM +0100, Olly Betts wrote:
> On Thu, Apr 12, 2007 at 12:33:14PM +0900, Josef Novak wrote:
> > My current query code, taken
> > from one of the examples, looks like:
> > Xapian::Query query(Xapian::Query::OP_OR, &string_tokens[0],
> > &string_tokens[string_tokens.size()]);

It seems more natural to say:

    Xapian::Query query(Xapian::Query::OP_OR, string_tokens.begin(), string_tokens.end());

But if string_tokens is a vector, your version should work fine.

> > Is there anything else I can do to optimize these simple OP_OR queries?  Are
> > there any other suggestions for optimization, or pointers to places in the
> > lists where this has been discussed, with fruitful results?
> 
> It sounds like the same issue as this, except that was building the
> query up pairwise, and using the "in one go" constructor was the
> workaround in that case:
> 
> http://thread.gmane.org/gmane.comp.search.xapian.general/3974
> 
> I thought I'd worked on a fix for that, but I don't seem to have checked
> anything in.  I probably unpatched it to work on something else - I'll
> dig it out.

I couldn't find it (perhaps I just looked at the code and thought about
how to solve it).  Anyway, I've now written and committed a fix for that
- now we only validate a Query::Internal when it's fully constructed, or
if we modify it.

> Do you have a self-contained (except for Xapian!) small program which
> shows this?  Failing that, some example lists of terms which build into
> slow queries?

I modified simplesearch.cc to remove the call to Enquire::set_query()
and everything afterwards (so I was just looking at the query parsing
time), and then tested it using:

perl -e 'print join " ", (1..10000)' | xargs time examples/simplesearch
tmp.db

Before the change above, SVN HEAD with even 10000 terms only takes about
3 seconds to build an OR query for.  100 takes neglible time (it's
essentially lost in the noise).  The change above seems to speed up
10000 terms a little.

So I'm going to need an example which demonstrates your problem.

Cheers,
    Olly



More information about the Xapian-discuss mailing list