[Xapian-devel] NearPostList and get_wdf

Olly Betts olly at survex.com
Tue Jan 6 03:47:24 GMT 2009


On Mon, Dec 29, 2008 at 02:09:14PM +0100, Yann ROBIN wrote:
> On Mon, Dec 29, 2008 at 1:50 PM, Richard Boulton
> <richard at lemurconsulting.com> wrote:
> > I'm not sure that modifying the wdf is really the way to go about this - it
> > seems to me that you might do better to use a custom weight class, which
> > factored in the frequencies of the individual terms, as well as their
> > proximity.

You have to choose a weight class for the whole query - it can't be
different for different subqueries.  So I'm not sure how this would
work.

A sane approach would probably be in NewNearPostList::get_weight() to
multiply the weight returned by the AND query's get_weight() method by a
non-negative factor which varies depending how close the terms are -
largest when they're together, much smaller when they are far apart.

This will be slower to run than the current NearPostList though as it
can't stop working on a document when it finds a match within the window
size - instead it has to check all the positional data for each document
matching the AND query to find the closest match.

This factor needs to have a known upper bound, which you multiply
get_maxweight() and recalc_maxweight() from the AND query by.

> > Feel free to open a feature request ticket, describing the feature that you
> > would like to exist.  OP_NEAR as it is currently implemented is behaving as
> > intended, though.
> 
> The ticket was more for the get_wdf not being called, i don't think this was
> something intended.

Currently NearPostList::get_wdf() and friends are dead code - I think
whoever wrote them probably didn't realise they wouldn't be needed.
It's even possible that they were actually used in a really early
version.

But once the synonym patch gets merged, I think they'll get used if you
do a synonym operation with OP_NEAR or OP_PHRASE as a subquery, so it
seems unhelpful to rip them out at this point.

Cheers,
    Olly



More information about the Xapian-devel mailing list