How to enhance the query performance for large boolean attribute

Olly Betts olly at survex.com
Thu Dec 7 04:35:19 GMT 2017


On Tue, Dec 05, 2017 at 11:01:27AM +0800, 程苏珺 wrote:
> I am a new user to Xapian, and now we met such problem. In our case, a
> document has many attributes which is boolean value, for example( A,
> B, C ) , and our search query will use certain filter logic ( A ==
> true and B == false ..) to combine with other search logic.
> 
> We use MatchDecider to implement the filter logic, and now we met some
> performance problem, because our self-defined scoring method is very
> complicated and cost many time. We do some analyzer, and actually the
> boolean attribute filter ( A == true and B == false ..) can filter
> lots of docs, but we found seems the MatchDecider is running after
> scoring, so it help less to the performance enhancement.
> 
> So would you please give us some suggesting for our case?

I would add a boolean term to documents where a particular attribute is
true, (e.g. XA1 is attribute A is true) and then you can express your
boolean filter logic as a Query object - e.g. A == true and B == false
is:

    Xapian::Query(Xapian::Query::OP_AND_NOT,
	Xapian::Query("XA1") 
	Xapian::Query("XB1"))

If you're using Xapian 1.4 and writing in C++, there are operator
overloads which allow you to write that as:

    Xapian::Query("XA1") &~ Xapian::Query("XB1")

(There's no need to explicitly index a term when a boolean attribute
is false, as you can just filter out those where it is true).

You can use this approach to filter to a parsed user query if you want -
just combine them using OP_FILTER.

Cheers,
    Olly



More information about the Xapian-discuss mailing list