Better Understanding of Programmatic Query Construction

Olly Betts olly at survex.com
Fri Jan 7 03:35:36 GMT 2022


On Sun, Dec 26, 2021 at 11:41:08PM -0500, Dustin Oprea wrote:
> It doesn't seem as if there is much documentation for query building. I've
> been mostly biased towards Python documentation in my searches. There
> doesn't appear to be a way to search the email archives.
> 
> What documentation there is mentions this example:
> 
> (
> https://github.com/xapian/xapian-docsprint/commit/f04c97f4d1722c2796ba5d807f441d5d2d4eec4d#diff-6ae69d2eefbbb95e7140a8e82ce0751fa6872172d52a05a2c7586e938bf8e4d1R288
> )

You'll find a more readable version of that here:

https://xapian.org/docs/bindings/python3/introduction.html#query

As noted there:

| The Python API largely follows the C++ API - the differences and
| additions are noted below.

At present at least, you'll want to look at the C++ API docs for
guidance, in this case:

https://xapian.org/docs/apidoc/html/classXapian_1_1Query.html

The document you're looking at only covers how the Python API differs
from the C++ one.

> Based on this limited amount of information, I tried converting my original
> string query from something like:
> 
> 'TERM1' AND title:"TERM2"
> 
> to (each more unbounded/desperate then the previous):
> 
> 1: q = xapian.Query(xapian.Query.OP_AND, "'TERM1'", "TERM2") (based on the
> first statement)
> 2: q = xapian.Query(xapian.Query.OP_AND, ["'TERM1'", "TERM2"])
> 3: q = xapian.Query(xapian.Query.OP_AND, ["TERM1", "TERM2"])
> 4: q = xapian.Query(xapian.Query.OP_AND, ["TERM1"])
> 5: q = xapian.Query(xapian.Query.OP_OR, ["TERM1"])

You can see the xapian.Query object that the QueryParser produces by
calling str() on it:

    $ python3 -c 'import sys, xapian; qp = xapian.QueryParser(); qp.add_prefix("title", "S"); print(str(qp.parse_query(sys.stdin.readline())))'
    'TERM1' AND title:"TERM2" 

    Query((term1 at 1 AND Sterm2 at 2))

(Here I've fed the query string in on stdin to avoid awkward quoting
since your query string contains both single and double quotes.)

The @1 and @2 are query positions, which mostly don't matter - the main
thing they currently support is iterating terms in "query order", which
might not always be the same as the order within the xapian.Query tree
- e.g. the query `-foo bar` -> Query((Zfoo at 2 AND_NOT Zbar at 1))

Assuming you don't care about query positions, then:

    q = xapian.Query(xapian.Query.OP_OR, ["term1", "Sterm2"])

If you want the positions set too:

    q = xapian.Query(xapian.Query.OP_OR, [xapian.Query("term1", 1, 1), xapian.Query("Sterm2", 1, 2)])

> Whereas the string query yielded results, I got zero results in each of
> these. What am I doing wrong? I'd appreciate someone explaining how to do
> literal (read: unstemmed, proper noun) searches. I'm not sure if wrapping
> in an inner set of quotes makes sense in this situation.

Your problems are:

* Terms are normalised even without stemming (relevant here:
  case-folded to lower case, and some punctuation ignored)
* `title:` is mapped to a term prefix (I'm assuming you're using `S` as
  that's the usual term prefix for `title:`)

> subq = xapian.Query(xapian.Query.OP_AND, "hello", "world")
> q = xapian.Query(xapian.Query.OP_AND, [subq, "foo", xapian.Query("bar", 2)])

> Also, I'm assuming that the example translates to "hello AND world AND foo
> AND ??", but how does that *xapian.Query("bar", 2)* term translate?

It's `bar` but with within query frequency (wqf) set to 2.  I don't think
there's a way to create exactly this query by parsing a query string.
The QueryParser is intended to parse user-entered search queries, not to
provide a way to generate every possible Query object tree from a string
specification.

Aside from that detail, this would parse to the above:

    ("hello" AND "world") AND "foo" AND "bar"

The parentheses aren't important here though since the meaning is the
same without them (and the query optimiser knows that).

I've quoted the terms here to prevent stemming (since xapian.Query just
takes the term exactly as specified).  If you haven't set a stemmer on
the QueryParser then those are not needed.

Cheers,
    Olly



More information about the Xapian-discuss mailing list