[Xapian-discuss] Slow phrase performance

Mark Malloy alabammy.skwirl at gmail.com
Fri Sep 30 17:49:59 BST 2011


I've been getting excellent performance out of xapian but when
searches on phrases of common terms such as [ "north america" ] or [
"art history" ] get run it will take a very long time to come up with
results.

Examples:
------------------------------
[ south africa ] -- 10379 results found in ~.2 sec
[ white house ] -- 17988 results found in <1 sec
Quoting either of those queries ends up timing out my web request to
the search program (+60 seconds, I think)

[ "kansas state" ] -- 334 results found in ~22 sec, 3719 results found
in <0.1 sec when unquoted
------------------------------


I keep thinking that I've got something misconfigured or am not
formatting my request properly.  Here's an example of the "kansas
state" query after going through the QueryParser
($xapquery->get_description):
------------------------------
Xapian::Query((Ckansas:(pos=1) PHRASE 2 Cstate:(pos=2)))
------------------------------


Hopefully these additional details will help:
------------------------------
Database is approx. 200,000 documents at around 8GB of source text.  I
have tried finding the most infrequent term within the phrase and
using that as a boolean filter on a separate boolean_prefix as well as
using it to limit the maximum number of matches, but that doesn't seem
to help on common terms.  I am not limiting the number of matches -- I
want all of them.  But even on small sets like "kansas state" with
just 334 results, it's taking a long time.

The following are set:
  $qp->set_default_op(OP_AND);
  $qp->set_stemming_strategy(STEM_NONE);

Flags enabled: BOOLEAN,BOOLEAN_ANY_CASE,WILDCARD,PHRASE
------------------------------

I have been unable to find reports of other people experiencing the
same problem with poor phrase performance, so I am hoping it's simply
something that I'm doing wrong or ineffectively.  Any help would be
appreciated.

Thanks!



More information about the Xapian-discuss mailing list