[Xapian-discuss] Slow phrase performance
Mark Malloy
alabammy.skwirl at gmail.com
Fri Sep 30 17:49:59 BST 2011
I've been getting excellent performance out of xapian but when
searches on phrases of common terms such as [ "north america" ] or [
"art history" ] get run it will take a very long time to come up with
results.
Examples:
------------------------------
[ south africa ] -- 10379 results found in ~.2 sec
[ white house ] -- 17988 results found in <1 sec
Quoting either of those queries ends up timing out my web request to
the search program (+60 seconds, I think)
[ "kansas state" ] -- 334 results found in ~22 sec, 3719 results found
in <0.1 sec when unquoted
------------------------------
I keep thinking that I've got something misconfigured or am not
formatting my request properly. Here's an example of the "kansas
state" query after going through the QueryParser
($xapquery->get_description):
------------------------------
Xapian::Query((Ckansas:(pos=1) PHRASE 2 Cstate:(pos=2)))
------------------------------
Hopefully these additional details will help:
------------------------------
Database is approx. 200,000 documents at around 8GB of source text. I
have tried finding the most infrequent term within the phrase and
using that as a boolean filter on a separate boolean_prefix as well as
using it to limit the maximum number of matches, but that doesn't seem
to help on common terms. I am not limiting the number of matches -- I
want all of them. But even on small sets like "kansas state" with
just 334 results, it's taking a long time.
The following are set:
$qp->set_default_op(OP_AND);
$qp->set_stemming_strategy(STEM_NONE);
Flags enabled: BOOLEAN,BOOLEAN_ANY_CASE,WILDCARD,PHRASE
------------------------------
I have been unable to find reports of other people experiencing the
same problem with poor phrase performance, so I am hoping it's simply
something that I'm doing wrong or ineffectively. Any help would be
appreciated.
Thanks!
More information about the Xapian-discuss
mailing list