[Xapian-discuss] Order of NOT operand?

Richard Boulton richard at tartarus.org
Tue Sep 1 14:31:00 BST 2009


2009/9/1 David Sauve <dnsauve at gmail.com>

>  To be more specific, the query string is a combination of user input (what
> the typed into the search box), and filters such as field equals, exclude,
> etc.  These are all done by Django-Haystack itself in order to make the
> backend (in this case Xapian) pluggable.
>

It's probably about time I checked out a copy of the django-haystack xapian
backend.  This sounds very unpleasant.  As Olly said, it's almost certainly
a mistake to be constructing something to pass to the query parser, rather
than passing it user input directly.  To construct queries without running
into unexpected problems with quoting, operator precedence, etc, is almost
impossible, and is always going to be fragile with respect to changes in the
query parser.  This is because the query parser is not parsing a formal
grammar - it is trying to guess the user's intention to some extend, and is
thus likely to get confused when presented with input which isn't actually
user input, but is machine generated.

Is this an unavoidable result of the way the rest of Django-Haystack works?
Is there no way that haystack can be persuaded to give the backend the raw
input?  If not, it sounds like a bug in Django-Haystack's design, to me...

Looking at
http://github.com/notanumber/xapian-haystack/blob/d593924386cc050e3e97ce129ff71dad50e1139e/xapian_backend.py#L268however,
it looks like the search() method is presented with the user's
query string separately from the list of fields to filter on.  Maybe I'm
misinterpreting.  Also, could the "build_query" function at line 879 in that
file not return a structured representation of the query, rather than a
single string?  (If there's some reason imposed by haystack that forces it
to be a string, you could always serialise it to a pickle or a JSON value
before passing it through.)

In practice, it is made up of two parts, a SearchBackend (the Xapian
> interface), and a SearchQuery (the bit the "cleans" and assembles the query
> string into a format that Xapian can recognise).
>
> What I get, after Django-Haystack is done, in the SeachQuery, is a series
> of
> filters for fields.  From this, I need to "re-assemble" a query string to
> be
> passed to the SearchBackend instance at a later time.
>

> > I wouldn't think that would matter, but
> > > the following two queries are generating different search results:
> > >
> > > java AND NOT id:1 NOT id:2
> > > vs.
> > > NOT id:1 NOT id:2 AND java
> >
> > What sort of prefix is "id"?
> >
> > In this case, "id" is a field prefix.
>

Looking at the results of your parse below, It looks like this field prefix
isn't being set on the query parser (with either add_prefix() or
add_boolean_prefix())  As a result, the ":" is being considered as a
phrase-generating word separator, and Xapian is trying to look for
occurrences of "id" followed by "1" or "2". It also looks like you might not
be supplying the right flags to the query parser to allow it to recognise
the "AND" in the second one.

Xapian::Query(((Zjava:(pos=1) AND_NOT (id:(pos=2) PHRASE 2 1:(pos=3)))
> AND_NOT (id:(pos=4) PHRASE 2 2:(pos=5))))
>
> Xapian::Query(((<alldocuments> AND_NOT (id:(pos=1) PHRASE 2 1:(pos=2)))
> AND_NOT ((id:(pos=3) PHRASE 2 2:(pos=4)) OR Zjava:(pos=5))))
>

-- 
Richard


More information about the Xapian-discuss mailing list