[Xapian-discuss] QueryParser (I think) question

Richard Boulton richard at tartarus.org
Tue May 22 21:01:32 BST 2007


Andreas Marienborg wrote:
> 
> On May 22, 2007, at 4:00 PM, James Aylett wrote:
> 
>> On Tue, May 22, 2007 at 02:34:42PM +0200, Andreas Marienborg wrote:
>>
>>> I would like to search in items that are either from source 1 or
>>> source 2 for instance,
>>> so I tried the following: "source:1 OR source:2", but that results in
>>> the following query after
>>> parsing:
>>>
>>>  Xapian::Query((or:(pos=1) FILTER (S1 AND S2)))
>>>
>>> is that expected? am I trying to do something that I cannot do? or am
>>> I just doing something wrong?

There's a combination of a missing feature in the query parser here, 
together with possibly some incorrect use of the query parser.  The 
documentation could always stand improvement, too.

Firstly, I should note that I don't get the same behaviour as you here 
(with Xapian 1.0.0, and the standard options for parse_query()): I'm 
guessing from your output that you're using a Xapian 0.9 release. 
Upgrading might help a bit, but the handling isn't perfect even in 
1.0.0, sadly: query parsers are incredibly hard to get right for all cases.


Boolean filters terms are processed specially by the query parser; the 
idea is that they filter the rest of the query, rather than contributing 
to it, so they are pulled out of the query to some extent and then 
processed separately.  Any explicit operators (or brackets) will cause 
the subquery on either side of the operator to be evaluated, and then 
the two subqueries will be combined with the specified operator.

With 1.0.0, the OR operator will actually work in the situation you 
describe: for your example, I get "site:1 OR site:2" parsing to 
"Xapian::Query((H1 AND H2))" (where site is a boolean prefix for H). 
This appears to be because the two sides of the OR operator are 
evaluated as queries in their own right, and since there are no terms, 
the boolean filter aspect is ignored; and then they are combined with 
the OR operator.

However, a query like "Foo site:1 OR site:2" doesn't get parsed to a 
useful thing - the "Foo site:1" bit becomes "foo FILTER H1", and this is 
then ORred with H2; ideally we would get "foo FILTER (H1 OR H2)"

If I change the OR to "or", so that it is not recognised as a boolean 
operator, I get the behaviour you describe, incidentally.

>> It's not what I'd expect. The query parser came out of omega, whose
>> documentation states that boolean terms are combined by OR for similar
>> prefixes, then AND for the different prefixes to create the overall
>> FILTER clause. Looks like this documentation is no longer correct :-/

Currently, the query parser always combines all boolean filter terms 
with AND.

I think this is a bug.  I've just checked, and this case isn't covered 
by the query parser testsuite, and there is a comment in 
queryparser.lemony (at line 1201) saying:
     // FIXME we should OR filters with the same prefix...
so it looks like support for multiple filters with the same prefix isn't 
yet implemented.

FWIW, I don't think this will correspond to a bug in omega's handling of 
filters from "B" cgi parameters, since they don't use the query parser, 
and that's what the documentation you're referring to was written for, I 
believe.  But it would be nice to fix it, anyway.

> But it is expected to ignore my OR and convert that to a "regular" term?

I assume that this is happening because the query parser isn't 
recognising the OR as an operator.  Is it possible you're setting a flag 
in the parse_query() method and turning the processing of boolean 
operators off by mistake?

> if I omit the OR it produces  Xapian::Query((S1 AND S2))
> 
> I have     $self->qp->set_default_op(OP_AND);
> 
> in my setup, but commenting it out does no difference.

The default operator is for "probabilistic" terms only - it isn't 
applied to boolean terms (and I don't think it should be).

-- 
Richard



More information about the Xapian-discuss mailing list