[Xapian-discuss] Phrase search performance

Olly Betts olly at survex.com
Mon Feb 20 20:37:16 GMT 2006


On Mon, Feb 20, 2006 at 02:27:53PM -0500, Alex Deucher wrote:
> Is there any way to speed up phrase searches?  What sort of
> performance should I expect?  Currently when I search against a 5.3 GB
> flint database it takes 4.5 minutes for a simple 2 word phrase.  Is
> that reasonable performance?

No.

Phrase searches involving two common terms can be slow, especially
where an AND query matches many documents but the two terms don't
often occur as a phrase, but 4.5 minutes is clearly ludicrous.

A more concrete example would be useful - what's the query, and
what are the term frequencies for the two terms involved?

> I'm using the perl interface to xapian 0.9.2.  I'm building my own
> Query objects rather than using QueryParser since we use ':' as part
> of our field prefix.

Can't you just set the prefix map to include the ":"?  i.e.

    queryparser.add_prefix("field", "FIELD:");

> Xapian::Query((FIELD:term1 PHRASE 2 FIELD:term2))
> 
> while QueryParser's looks like:
> 
> Xapian::Query((term1(pos=1) PHRASE 2 term2(pos=2)))
> 
> where is the position information coming from and how do I add it to
> my query?

There are optional parameters on the Query from term name constructor,
one of which sets the query position.

> Will it help or is it irrelevant?

I think it's only used to sort the query terms which match a particular
document into order (they may need to be reordered to build the query -
e.g. 'hello +world' -> 'world AND_MAYBE hello').

> The query object (at least the perl interface) only allows me to build
> queries of the form:

For Perl, see the "new_term" method of "Search::Xapian::Query" - added
in 0.9.2.3.

Cheers,
    Olly



More information about the Xapian-discuss mailing list