[Xapian-discuss] Order of NOT operand?

David Sauve dnsauve at gmail.com
Tue Sep 1 13:22:28 BST 2009


On Tue, Sep 1, 2009 at 1:04 AM, Olly Betts <olly at survex.com> wrote:

> On Mon, Aug 31, 2009 at 10:01:58AM -0400, David Sauve wrote:
> > I'm having a strange issue with NOT queries in my xapian backend for
> > Django-Haystack.  The query string is generated through user input, and
> as
> > such, the order is undetermined.
>
> Hmm, "generated through"?  The query string should really *be* user
> input.  It is almost inevitably a mistake to try to modify it before
> passing it to Xapian.  If you want to apply other filtering, combine
> queries, etc, then do that to the Xapian::Query object(s) produced.
>
> To be more specific, the query string is a combination of user input (what
the typed into the search box), and filters such as field equals, exclude,
etc.  These are all done by Django-Haystack itself in order to make the
backend (in this case Xapian) pluggable.

In practice, it is made up of two parts, a SearchBackend (the Xapian
interface), and a SearchQuery (the bit the "cleans" and assembles the query
string into a format that Xapian can recognise).

What I get, after Django-Haystack is done, in the SeachQuery, is a series of
filters for fields.  From this, I need to "re-assemble" a query string to be
passed to the SearchBackend instance at a later time.


> > I wouldn't think that would matter, but
> > the following two queries are generating different search results:
> >
> > java AND NOT id:1 NOT id:2
> > vs.
> > NOT id:1 NOT id:2 AND java
>
> What sort of prefix is "id"?
>
> In this case, "id" is a field prefix.


> > Logically, I'd think this would be the same, but in practice, it's not.
>  The
> > first format seems to generate random results, but the second, generates
> the
> > correct results.
>
> They aren't quite the same in practice - the first is:
>
> ((java NOT id:1) NOT id:2)
>
> And the second (with FLAG_PURE_NOT enabled) is:
>
> ((<everything> NOT id:1) NOT id:2) AND java
>
> Ideally the <everything> in the second case would be eliminated by the
> optimiser, but I don't think it currently is.
>
> These are the queries I'm seeing:

Xapian::Query(((Zjava:(pos=1) AND_NOT (id:(pos=2) PHRASE 2 1:(pos=3)))
AND_NOT (id:(pos=4) PHRASE 2 2:(pos=5))))

Xapian::Query(((<alldocuments> AND_NOT (id:(pos=1) PHRASE 2 1:(pos=2)))
AND_NOT ((id:(pos=3) PHRASE 2 2:(pos=4)) OR Zjava:(pos=5))))


> But these should both match the same documents.
>
> I'd check the parsed Query objects with get_description() to see if they
> look right.
>
> Cheers,
>     Olly
>


More information about the Xapian-discuss mailing list