[Xapian-discuss] early termination in xapian?

Olly Betts olly at survex.com
Fri May 15 06:07:19 BST 2009


On Thu, May 14, 2009 at 01:48:25PM +0200, Daniel Etzold wrote:
> I have a query like "q1 OR q2 OR q3" and the result set should be ranked 
> by relevance. For that case I'd like to know whether the matcher does 
> have to read the complete posting lists for q1, q2 and q3 or whether the 
> matcher supports some kind of early termination so that only a small 
> fraction of each list of a query term has to be read.

Yes, the matcher can terminate early, and it also handles the ORs
"decaying" to AND_MAYBE or AND, which can then skip over chunks of
postings for documents which can't match.

There's an overview here:

http://xapian.org/docs/matcherdesign.html

The gory details are in the matcher subdirectory of the source tree.

> Furthermore, does the matcher first build a M-set for "q1 OR q2" and 
> than incorporates the results of q3 or does the matcher work on the 
> posting lists of each query term in parallel?

In parallel - that's much more memory efficient (since q1 OR q2 could
match every document in the database...)

Cheers,
    Olly



More information about the Xapian-discuss mailing list