[Xapian-discuss] Bug in set_cutoff - xapian-core 1.0.10
Olly Betts
olly at survex.com
Sun Jan 18 11:00:18 GMT 2009
On Wed, Jan 14, 2009 at 12:03:22PM +0000, Olly Betts wrote:
> On Tue, Jan 13, 2009 at 12:18:33PM -0800, Kevin Duraj wrote:
> > We have a bug when calling set_cutoff and
> > set_sort_by_value_then_relevance functions. Some documents are
> > displaying at the beginning and at end of result sets but are not
> > displaying in middle of result set.
>
> I think this is likely the same issue as this bug:
>
> http://trac.xapian.org/ticket/216
It occurred to me over the weekend that while this bug won't help, a
percentage cutoff while sorting primarily by value just isn't going to
work properly as things stand.
The fundamental problem is that we might find a document with a higher
relevance score thus increasing the lower bound on the weight which
the percentage cutoff corresponds to, resulting in us dropping lower
scoring documents from our proto-MSet and requiring us to fill the
proto-MSet back up to the required size with documents we've already
discarded because their sort key sorted lower than those in the
proto-MSet.
The only fix I see in general is to never discard a matching document
(except by the percentage cutoff test) until the end of the match, when
we know the weight "100%" corresponds to. Sadly, that's going to
potentially use a lot of memory for queries which match many documents
but I think that's probably unavoidable.
Cheers,
Olly
More information about the Xapian-discuss
mailing list