NEAR non-leaf subqueries

Olly Betts olly at survex.com
Fri Jan 20 23:23:32 GMT 2017


On Fri, Jan 20, 2017 at 03:35:13PM +0100, Jean-Francois Dockes wrote:
> Olly Betts writes:
>  > On Thu, Jan 12, 2017 at 07:53:21PM +0100, Jean-Francois Dockes wrote:
>  > 
>  > > Recoll also supports multi-word synonyms which could potentially
>  > > generate PHRASE subqueries inside NEAR queries, but this
>  > > understandably already did not work with 1.2, so the multi-word
>  > > expansions are only used when proximity is not involved (by the way,
>  > > proximity of phrases does make sense in this case, if there is a
>  > > wishlist somewhere, but it's admittedly not an issue that most users
>  > > will be concerned with...).
>  > 
>  > Another case for https://trac.xapian.org/ticket/508 I think.
> 
> The ticket only lists OP_OR as subqueries

OP_OR is the example used in the description, but the ticket isn't only
about OP_OR - note "OP_OR, *etc*" in the description, the title says
"non-leaf subqueries", and other operators are explicitly discussed:

* OP_AND: https://trac.xapian.org/ticket/508#comment:8
* OP_AND_NOT: https://trac.xapian.org/ticket/508#comment:11

I've added a note about OP_NEAR/OP_PHRASE.

>  > The code I pushed before wouldn't handle an OR of more than two things,
>  > so you couldn't do a 3+-way stem expansion:
>  > 
>  >     (text OR texts) NEAR (search OR searches OR searched OR searching)
>  > 
>  > But I've just pushed an update which will handle this.
> 
> Ok, I hadn't even noticed the limitation. Dit it silently truncated the
> OR list ?

It would throw Xapian::UnimplementedError.

> But, actually, so does the previous version (commit 389dfb319a66), which
> explains why I had not understood what the limitation was.
> 
> Both versions also work fine with "floor floor floor"p:
> 
> (floors OR flooring OR floored OR floor) NEAR 13
> (floors OR flooring OR floored OR floor) NEAR 13
> (floors OR flooring OR floored OR floor)
> 
> So: me happy but confused...

I suspect that at most two of those terms are present in any given
document in your database - the limitation was actually on the number of
terms returning positions together for the OR, not the number in the
query.

Cheers,
Olly



More information about the Xapian-discuss mailing list