[Xapian-discuss] Bug in set_cutoff - xapian-core 1.0.10

Olly Betts olly at survex.com
Sun Jan 18 11:00:18 GMT 2009


On Wed, Jan 14, 2009 at 12:03:22PM +0000, Olly Betts wrote:
> On Tue, Jan 13, 2009 at 12:18:33PM -0800, Kevin Duraj wrote:
> > We have a bug when calling set_cutoff and
> > set_sort_by_value_then_relevance functions. Some documents are
> > displaying at the beginning and at end of result sets but are not
> > displaying in middle of result set.
> 
> I think this is likely the same issue as this bug:
> 
> http://trac.xapian.org/ticket/216

It occurred to me over the weekend that while this bug won't help, a
percentage cutoff while sorting primarily by value just isn't going to
work properly as things stand.

The fundamental problem is that we might find a document with a higher
relevance score thus increasing the lower bound on the weight which
the percentage cutoff corresponds to, resulting in us dropping lower
scoring documents from our proto-MSet and requiring us to fill the
proto-MSet back up to the required size with documents we've already
discarded because their sort key sorted lower than those in the
proto-MSet.

The only fix I see in general is to never discard a matching document
(except by the percentage cutoff test) until the end of the match, when
we know the weight "100%" corresponds to.  Sadly, that's going to
potentially use a lot of memory for queries which match many documents
but I think that's probably unavoidable.

Cheers,
    Olly



More information about the Xapian-discuss mailing list