NEAR non-leaf subqueries

Jean-Francois Dockes jf at dockes.org
Thu Jan 12 18:53:21 GMT 2017


Olly Betts writes:
 > On Wed, Jan 04, 2017 at 07:29:58AM +0100, Jean-Francois Dockes wrote:
 > > Olly Betts writes:
 > > > The ticket has a patch which attempts to handle the OR case (which seems
 > > > to be the part you actually care about) but this suffers from issues with
 > > > object lifetimes which get a bit involved in the details.  Since there
 > > > wasn't a working patch when we got to making the hard decisions about
 > > > which tickets to bump to get 1.4.0 out, and since addressing this
 > > > shouldn't require ABI changes, it got bumped.
 > > 
 > > Thank you for this answer.
 > > 
 > > I need to choose between three approaches:
 > > 
 > >  - Implement support at the application level.
 > >  - Shift back to 1.2 
 > >  - Just wait for 1.4.x
 > 
 > Or help fix up the patch in the ticket?

Yep. But I earnestly believe that I'm not up to the task of fiddling with
Xapian internals. You may remember that I gave it a try quite a long time
ago, (it was the very same issue actually), and that, if I remember well,
my change did not quite do what it was supposed to do...

 > > I'd rather go back to 1.2 than used a patched 1.4 by the way.
 > 
 > Once we have a working patch, it should be mergable into 1.4.x (I can't
 > see why any ABI changes would be needed) so using a patched 1.4
 > shouldn't be an issue.

My phrase was unclear: explanation: I could use a patched 1.4 on Windows
where libxapian is bundled with recoll, but I was thinking ahead to a
situation where I'd have a 1.2/1.4 choice on Linux, where bundling a
patched 1.4 would not be acceptable. In the latter case, I'd rather use 1.2
because of the NEAR issue.

 > > This all depends on your expected schedule (I guess that this would have
 > > been a better term than 'plan', which is indeed described in the ticket). I
 > > am not asking for anything beyond information here. Do you have any idea of
 > > the very approximate time when the change might be implemented ?
 > 
 > I had another poke at the patch and have a reworked version which solves the
 > object lifetime issue and works for some simple tests.  Can you try it out
 > and see if it works for you?
 > 
 > https://trac.xapian.org/ticket/508#comment:13
 > 
 > There are two limitations:
 > 
 >  * Only OP_OR subqueries are handled.  I think supporting these would be a
 >    useful step forward by itself, and AIUI it's all you actually need.

Yes, my need arises from stem or synonym expansions occurring inside a NEAR
query.

This happens without the user doing anything special, so it's a problem
when it causes an error.

Recoll also supports multi-word synonyms which could potentially generate
PHRASE subqueries inside NEAR queries, but this understandably already did
not work with 1.2, so the multi-word expansions are only used when proximity
is not involved (by the way, proximity of phrases does make sense in this
case, if there is a wishlist somewhere, but it's admittedly not an issue
that most users will be concerned with...).


 >  * Currently the OP_OR subqueries can only have two subqueries of their own.
 >    Lifting this restriction needs a bit of work on the new
 >    OrPositionList class 
 >    - the old patch used a series of pairwise OrPositionList objects, but the
 >    new patch needs a single one instead - the class needs reworking to handle
 >    that. 
 > 
 > So I think the second limitation needs addressing, and of course any bugs
 > resolving.

I am not sure that I completely understand this paragraph, but, anyway,
although I have a bit of trouble reading my own code, I think that recoll
will only add flat OP_OR queries as subqueries of the NEAR one. I tested
the patch and it does seem to answer my selfish needs...

 > I can't promise anything re schedule, but hopefully we can sort this out
 > fairly soon.  At least the solution for what's missing now is fairly clear -
 > we probably want to put the sub-positionlists into a min heap.

See, you lost me with the last phrase, and that's why it's better that I
don't get into Xapian-core internals :)

Anyway it's good enough to know that a patch exists which will hopefully
make its way into 1.4.x, meaning that I have no need to work on a bad
application-level solution. Thanks !

Cheers,

jf



More information about the Xapian-discuss mailing list