[Xapian-discuss] QueryParser (I think) question
Andreas Marienborg
andreas at startsiden.no
Wed May 23 08:26:05 BST 2007
On May 22, 2007, at 10:01 PM, Richard Boulton wrote:
> Andreas Marienborg wrote:
>> On May 22, 2007, at 4:00 PM, James Aylett wrote:
>>> On Tue, May 22, 2007 at 02:34:42PM +0200, Andreas Marienborg wrote:
>>>
>>>> I would like to search in items that are either from source 1 or
>>>> source 2 for instance,
>>>> so I tried the following: "source:1 OR source:2", but that
>>>> results in
>>>> the following query after
>>>> parsing:
>>>>
>>>> Xapian::Query((or:(pos=1) FILTER (S1 AND S2)))
>>>>
>>>> is that expected? am I trying to do something that I cannot do?
>>>> or am
>>>> I just doing something wrong?
>
> There's a combination of a missing feature in the query parser
> here, together with possibly some incorrect use of the query
> parser. The documentation could always stand improvement, too.
>
> Firstly, I should note that I don't get the same behaviour as you
> here (with Xapian 1.0.0, and the standard options for parse_query
> ()): I'm guessing from your output that you're using a Xapian 0.9
> release. Upgrading might help a bit, but the handling isn't perfect
> even in 1.0.0, sadly: query parsers are incredibly hard to get
> right for all cases.
>
Yes, but I am unable to find Search::Xapian (perl-modules) for 1.0 on
cpan. Is there anywhere else I might obtain them?
>
> Boolean filters terms are processed specially by the query parser;
> the idea is that they filter the rest of the query, rather than
> contributing to it, so they are pulled out of the query to some
> extent and then processed separately. Any explicit operators (or
> brackets) will cause the subquery on either side of the operator to
> be evaluated, and then the two subqueries will be combined with the
> specified operator.
>
> With 1.0.0, the OR operator will actually work in the situation you
> describe: for your example, I get "site:1 OR site:2" parsing to
> "Xapian::Query((H1 AND H2))" (where site is a boolean prefix for
> H). This appears to be because the two sides of the OR operator are
> evaluated as queries in their own right, and since there are no
> terms, the boolean filter aspect is ignored; and then they are
> combined with the OR operator.
>
> However, a query like "Foo site:1 OR site:2" doesn't get parsed to
> a useful thing - the "Foo site:1" bit becomes "foo FILTER H1", and
> this is then ORred with H2; ideally we would get "foo FILTER (H1 OR
> H2)"
>
> If I change the OR to "or", so that it is not recognised as a
> boolean operator, I get the behaviour you describe, incidentally.
>
After having experimented a bit more, I am even more baffeled:
Query: (brukervennlighet) AND (source:1 OR source:2)
unmodified query: Xapian::Query((brukervennlighet:(pos=1) AND (S1 OR
S2)))
which actually works as intended (it passes my test-cases :)
Now, if I skip () around everything, and any explicit AND/ORs I get
the following:
Query: brukervennlighet source:1 source:2
unmodified query: Xapian::Query((brukervennlighet:(pos=1) FILTER (S1
AND S2)))
Adding the OR back inbetween source:1 and source:2 gives me:
Query: brukervennlighet source:1 OR source:2
unmodified query: Xapian::Query(((brukervennlighet:(pos=1) FILTER S1)
OR S2))
But neither of those are what you got, so I guess something changed
for 1.0 here.
I will redo the tests once I get 1.0 installed here
>>> It's not what I'd expect. The query parser came out of omega, whose
>>> documentation states that boolean terms are combined by OR for
>>> similar
>>> prefixes, then AND for the different prefixes to create the overall
>>> FILTER clause. Looks like this documentation is no longer
>>> correct :-/
>
> Currently, the query parser always combines all boolean filter
> terms with AND.
>
> I think this is a bug. I've just checked, and this case isn't
> covered by the query parser testsuite, and there is a comment in
> queryparser.lemony (at line 1201) saying:
> // FIXME we should OR filters with the same prefix...
> so it looks like support for multiple filters with the same prefix
> isn't yet implemented.
>
> FWIW, I don't think this will correspond to a bug in omega's
> handling of filters from "B" cgi parameters, since they don't use
> the query parser, and that's what the documentation you're
> referring to was written for, I believe. But it would be nice to
> fix it, anyway.
>
>> But it is expected to ignore my OR and convert that to a "regular"
>> term?
>
> I assume that this is happening because the query parser isn't
> recognising the OR as an operator. Is it possible you're setting a
> flag in the parse_query() method and turning the processing of
> boolean operators off by mistake?
>
I create the queryparser like this: $self->qp
(Search::Xapian::QueryParser->new($self->db));
(where $self->db is a Search::Xapian::Database object)
I then add some boolean and non-boolean prefixes, set a stemmer etc.
When searching I do
my $query_obj=$self->qp->parse_query( $processed_query );
so I don't think I am setting any flags.
>> if I omit the OR it produces Xapian::Query((S1 AND S2))
>> I have $self->qp->set_default_op(OP_AND);
>> in my setup, but commenting it out does no difference.
>
> The default operator is for "probabilistic" terms only - it isn't
> applied to boolean terms (and I don't think it should be).
>
Yeah, I agree, it doesn't make sense :) I just tested to see if it
affected it.
Thanks for your great answers
- andreas
More information about the Xapian-discuss
mailing list