[Xapian-discuss] QueryParser (I think) question

Andreas Marienborg andreas at startsiden.no
Wed May 23 08:26:05 BST 2007


On May 22, 2007, at 10:01 PM, Richard Boulton wrote:

> Andreas Marienborg wrote:
>> On May 22, 2007, at 4:00 PM, James Aylett wrote:
>>> On Tue, May 22, 2007 at 02:34:42PM +0200, Andreas Marienborg wrote:
>>>
>>>> I would like to search in items that are either from source 1 or
>>>> source 2 for instance,
>>>> so I tried the following: "source:1 OR source:2", but that  
>>>> results in
>>>> the following query after
>>>> parsing:
>>>>
>>>>  Xapian::Query((or:(pos=1) FILTER (S1 AND S2)))
>>>>
>>>> is that expected? am I trying to do something that I cannot do?  
>>>> or am
>>>> I just doing something wrong?
>
> There's a combination of a missing feature in the query parser  
> here, together with possibly some incorrect use of the query  
> parser.  The documentation could always stand improvement, too.
>
> Firstly, I should note that I don't get the same behaviour as you  
> here (with Xapian 1.0.0, and the standard options for parse_query 
> ()): I'm guessing from your output that you're using a Xapian 0.9  
> release. Upgrading might help a bit, but the handling isn't perfect  
> even in 1.0.0, sadly: query parsers are incredibly hard to get  
> right for all cases.
>

Yes, but I am unable to find Search::Xapian (perl-modules) for 1.0 on  
cpan. Is there anywhere else I might obtain them?

>
> Boolean filters terms are processed specially by the query parser;  
> the idea is that they filter the rest of the query, rather than  
> contributing to it, so they are pulled out of the query to some  
> extent and then processed separately.  Any explicit operators (or  
> brackets) will cause the subquery on either side of the operator to  
> be evaluated, and then the two subqueries will be combined with the  
> specified operator.
>
> With 1.0.0, the OR operator will actually work in the situation you  
> describe: for your example, I get "site:1 OR site:2" parsing to  
> "Xapian::Query((H1 AND H2))" (where site is a boolean prefix for  
> H). This appears to be because the two sides of the OR operator are  
> evaluated as queries in their own right, and since there are no  
> terms, the boolean filter aspect is ignored; and then they are  
> combined with the OR operator.
>
> However, a query like "Foo site:1 OR site:2" doesn't get parsed to  
> a useful thing - the "Foo site:1" bit becomes "foo FILTER H1", and  
> this is then ORred with H2; ideally we would get "foo FILTER (H1 OR  
> H2)"
>
> If I change the OR to "or", so that it is not recognised as a  
> boolean operator, I get the behaviour you describe, incidentally.
>

After having experimented a bit more, I am even more baffeled:

Query: (brukervennlighet) AND (source:1 OR source:2)
unmodified query: Xapian::Query((brukervennlighet:(pos=1) AND (S1 OR  
S2)))

which actually works as intended (it passes my test-cases :)

Now, if I skip () around everything, and any explicit AND/ORs I get  
the following:

Query: brukervennlighet source:1 source:2
unmodified query: Xapian::Query((brukervennlighet:(pos=1) FILTER (S1  
AND S2)))

Adding the OR back inbetween source:1 and source:2 gives me:

Query: brukervennlighet source:1 OR source:2
unmodified query: Xapian::Query(((brukervennlighet:(pos=1) FILTER S1)  
OR S2))


But neither of those are what you got, so I guess something changed  
for 1.0 here.

I will redo the tests once I get 1.0 installed here

>>> It's not what I'd expect. The query parser came out of omega, whose
>>> documentation states that boolean terms are combined by OR for  
>>> similar
>>> prefixes, then AND for the different prefixes to create the overall
>>> FILTER clause. Looks like this documentation is no longer  
>>> correct :-/
>
> Currently, the query parser always combines all boolean filter  
> terms with AND.
>
> I think this is a bug.  I've just checked, and this case isn't  
> covered by the query parser testsuite, and there is a comment in  
> queryparser.lemony (at line 1201) saying:
>     // FIXME we should OR filters with the same prefix...
> so it looks like support for multiple filters with the same prefix  
> isn't yet implemented.
>
> FWIW, I don't think this will correspond to a bug in omega's  
> handling of filters from "B" cgi parameters, since they don't use  
> the query parser, and that's what the documentation you're  
> referring to was written for, I believe.  But it would be nice to  
> fix it, anyway.
>
>> But it is expected to ignore my OR and convert that to a "regular"  
>> term?
>
> I assume that this is happening because the query parser isn't  
> recognising the OR as an operator.  Is it possible you're setting a  
> flag in the parse_query() method and turning the processing of  
> boolean operators off by mistake?
>

I create the queryparser like this: 	$self->qp 
(Search::Xapian::QueryParser->new($self->db));
(where $self->db is a Search::Xapian::Database object)

I then add some boolean and non-boolean prefixes, set a stemmer etc.

When searching I do
     my $query_obj=$self->qp->parse_query( $processed_query );
so I don't think I am setting any flags.

>> if I omit the OR it produces  Xapian::Query((S1 AND S2))
>> I have     $self->qp->set_default_op(OP_AND);
>> in my setup, but commenting it out does no difference.
>
> The default operator is for "probabilistic" terms only - it isn't  
> applied to boolean terms (and I don't think it should be).
>

Yeah, I agree, it doesn't make sense :) I just tested to see if it  
affected it.

Thanks for your great answers


- andreas




More information about the Xapian-discuss mailing list