[Xapian-discuss] Another query parser bug

Ron Kass ron at pidgintech.com
Tue Oct 23 18:17:05 BST 2007


Actually, the bug as you can see from example #2 is the same if the 
filter is in the end

    test #2: print "wrong: ".$QueryParser->parse_query(qq{(Title:word) 
-notallowed},(FLAG_BOOLEAN | FLAG_PHRASE | FLAG_LOVEHATE |  
FLAG_WILDCARD))."\n";

    retust #2: wrong: Xapian::Query((Znotallow:(pos=1) FILTER Tword))


    test #4: print "wrong: ".$QueryParser->parse_query(qq{-notallowed 
Title:word},(FLAG_BOOLEAN | FLAG_PHRASE | FLAG_LOVEHATE | 
FLAG_WILDCARD))."\n";

    result #4: wrong: Xapian::Query((Znotallow:(pos=1) FILTER Tword))


As you can see, when the -notallowed is either at the end or at the 
beginning, it doesn't matter, same result in parsing.

Or did you mean something else when talking about filters at the end?


Best regards,

Ron



Olly Betts wrote:

> On Tue, Oct 23, 2007 at 04:35:04PM +0200, Ron Kass wrote:
>   
>>    print "wrong: ".$QueryParser->parse_query(qq{Title:word
>>    -notallowed},(FLAG_BOOLEAN | FLAG_PHRASE | FLAG_LOVEHATE |
>>    FLAG_WILDCARD))."\n";
>>     
>
> I think this is related to a problem I noticed earlier this week - we
> fail to parse filter-type operations in the middle of a query:
>
>      foo site:example.org bar
>      foo -site:example.org bar
>      foo -ignore bar
>
> People tend to specify filters at the end, which I guess is why
> nobody noticed this before.
>
> I looked into those cases and it's down to the grammar rules not
> allowing it, which is a bug, but a bit more involved to fix than
> your previous one.  I'll add your testcases to mine and check they
> all work when I fix this.
>
>   
>> And one last question regarding the parser in this case..
>> Should/Could there be any performance difference between the following
>> three parsed queries? (FILTER vs AND_NOT and AND_NOT*2 vs AND_NOT/OR)
>> 1. Xapian::Query(((Zterm:(pos=1) Znotallow:(pos=2)) FILTER (Tfirst OR
>> Tword)))
>>     
>
> There seems to be an operator (AND_NOT?) missing before Znotallow.
>
>   
>> 2. Xapian::Query(((Zterm:(pos=1) AND_NOT Znotallow:(pos=2) AND_NOT
>> Tfirst:(pos=3)) FILTER Tword))
>> 3. Xapian::Query(((Zterm:(pos=1) AND_NOT (Znotallow:(pos=2) OR
>> Tfirst:(pos=3))) FILTER Tword))
>>     
>
> I can see that (2) and (3) are essentially the same query represented in
> two different ways.  But (1) seems to be a different query (no matter
> what the missing operator is).  If that's correct, then (1) clearly can
> (and often will) perform differently to (2) and (3).
>
> Currently, (2) and (3) will actually be executed in different ways.  I'm
> not certain which would be more efficient (and it may depend on the
> data).  I suspect there's not much in it unless there are a lot of
> filter terms, in which case my hunch is that (3) might have the edge
> because of the balancing we do for OrPostList trees.  If you have, or
> can easily produce, some benchmark data, it would be interesting to
> know.
>
> I've implemented an internal "QueryOptimiser" class for 1.0.4 which
> provides a much improved framework for building optimal postlist trees
> from queries, so it's now much easier to do these sort of things.
>
> Cheers,
>     Olly
>   


More information about the Xapian-discuss mailing list