use xapian.Query.OP_VALUE_RANGE or use xapian.MatchDecider?

Olly Betts olly at survex.com
Fri Jan 12 05:33:58 GMT 2018


On Thu, Jan 11, 2018 at 05:11:20PM +0800, 张少华 wrote:
> HI, We have an index database of products, about 20 million. We had
> constructed the title and description of products into posting list,
> and also stored some values of properties into slot, such as the
> price, comment count, production date, click number of the products.
> 
> Now we want select some products which satisties specific condition,
> such as contain the term of "shirt" and "white", and "price <= 500"
> and "comment count >= 100", "1000 <= click_number <= 2000".
> 
> And we have two methods:
> 1, use xapian.Query for terms and xapian.Query.OP_VALUE_RANGE to
> filter the value.
> 2, use xapian.Query for terms to get candidates, then use
> xapian.MatchDecider to filter the value.
> 
> Which method get a better performance?

Use OP_VALUE_RANGE for a range check on a value - then the matcher
actually knows what the check being performed is, which means it can
optimise better.

That's probably doubly true if you're using Xapian via one of the
bindings (which I'm guessing you are from "xapian.Query") since a
MatchDecider subclass in that language will require a call between
languages for every candidate document considered, and that's likely to
be significantly slower than staying within C++.  The matcher will try
to call it as little as it can, but in cases where a lot of documents
match without the filter but the filter rejects most of them it may need
to call it millions of times if you have 20 million documents.

Cheers,
    Olly



More information about the Xapian-discuss mailing list