[Xapian-discuss] Bug in set_cutoff - xapian-core 1.0.10

Kevin Duraj kevin.softdev at gmail.com
Tue Feb 3 05:30:06 GMT 2009


Olly,

We should not make any change to the code, if that would cause
performance loss or significant memory usage. Let's leave everything
as it is because then we can manipulate multi millions search result
set. The top search result is always good after cut off and sorting,
that is most important.

Thanks,
Kevin Duraj

On Sun, Jan 18, 2009 at 3:00 AM, Olly Betts <olly at survex.com> wrote:
> On Wed, Jan 14, 2009 at 12:03:22PM +0000, Olly Betts wrote:
>> On Tue, Jan 13, 2009 at 12:18:33PM -0800, Kevin Duraj wrote:
>> > We have a bug when calling set_cutoff and
>> > set_sort_by_value_then_relevance functions. Some documents are
>> > displaying at the beginning and at end of result sets but are not
>> > displaying in middle of result set.
>>
>> I think this is likely the same issue as this bug:
>>
>> http://trac.xapian.org/ticket/216
>
> It occurred to me over the weekend that while this bug won't help, a
> percentage cutoff while sorting primarily by value just isn't going to
> work properly as things stand.
>
> The fundamental problem is that we might find a document with a higher
> relevance score thus increasing the lower bound on the weight which
> the percentage cutoff corresponds to, resulting in us dropping lower
> scoring documents from our proto-MSet and requiring us to fill the
> proto-MSet back up to the required size with documents we've already
> discarded because their sort key sorted lower than those in the
> proto-MSet.
>
> The only fix I see in general is to never discard a matching document
> (except by the percentage cutoff test) until the end of the match, when
> we know the weight "100%" corresponds to.  Sadly, that's going to
> potentially use a lot of memory for queries which match many documents
> but I think that's probably unavoidable.
>
> Cheers,
>    Olly
>



More information about the Xapian-discuss mailing list