NEAR non-leaf subqueries

Olly Betts olly at survex.com
Wed Jan 4 00:22:01 GMT 2017


On Thu, Dec 29, 2016 at 07:21:41PM +0100, Jean-Francois Dockes wrote:
> Xapian 1.2 supports a query like:
> 
>    (A OR B) NEAR (C OR D)
> 
> and distributes the factors to create something like:
> 
>     (A NEAR 2 C) OR (B NEAR 2 C) OR (B NEAR 2 C) OR (A NEAR 2 C)
> 
> Xapian 1.4 rejects such a query with the error message.
> 
>     OP_NEAR and OP_PHRASE only currently support leaf subqueries
> 
> Because Recoll expands the terms to their stem siblings at query time, its
> NEAR queries are affected by the change (no stemming is used with PHRASE
> queries, so these are unaffected).
> 
> Of course, it would be possible to effect the distribution at the
> application level, but, before I get into this, I would like to know if
> there is a plan to restore the 1.2 behaviour, or if the new one is
> permanent ?
> 
> I saw https://trac.xapian.org/ticket/508, but it is rather inconclusive as
> to the future plans.

The plan is that this should be supported (see the title of the ticket,
and also note the "currently" in the exception message).

The query internals were completely rewritten between 1.2 and 1.4, which
is why the old support is gone.

The old approach is excessively inefficient so personally I'm not keen to
spend time recreating that - I'd rather we implement this "properly", and
also make sure that it works in a non-surprising way (which blindly
distributing operators doesn't always achieve, as noted in the ticket
comments).

The ticket has a patch which attempts to handle the OR case (which seems
to be the part you actually care about) but this suffers from issues with
object lifetimes which get a bit involved in the details.  Since there
wasn't a working patch when we got to making the hard decisions about
which tickets to bump to get 1.4.0 out, and since addressing this
shouldn't require ABI changes, it got bumped.

Cheers,
    Olly



More information about the Xapian-discuss mailing list