[Xapian-devel] [Xapian-commits] 10821: trunk/xapian-core/ trunk/xapian-core/api/

Olly Betts olly at survex.com
Sun Jul 13 10:38:45 BST 2008


On Mon, Jul 07, 2008 at 08:06:32AM +0100, Richard Boulton wrote:
> Olly Betts wrote:
> > On Sun, Jul 06, 2008 at 11:57:40PM +0100, richard wrote:
> >> api/omenquire.cc: When calculating percentages, round to the
> >> nearest integer, rather than rounding down.  There was a FIXME
> >> about this, but no explanation of why it hadn't already been
> >> done, and I can see no bad side effects so far.  The most obvious
> >> positive effect is that queries which should get precisely 100%
> >> will no longer be assigned 99% due to rounding errors.
> > 
> > Well, one issue is that queries which shouldn't get precisely 100% now
> > can...
> > 
> > I don't know how common an issue that is, but then I don't know how
> > common the issue you mention is either.
> 
> The test case I committed yesterday suffered from this problem for me, 
> and I've certainly seen it before (generally with large queries), but I 
> couldn't guess at a rate at which it occurs.

I can't reproduce this issue with the patch reversed, and it makes
handling of percentage cutoff inconsistent - setting the cutoff to n%
doesn't return documents which would have got n% by being rounded up.

So I've reversed it for now (and added a testcase pctcutoff3 to show the
issue, which failed with the patch applied).

> I don't think it's unreasonable to return 100% for a document which 
> matches well enough to get 99.5%; and it's certainly more reasonable 
> than returning 99% for a document which actually got 99.999999%.
> 
> I suppose we could instead round up only very slightly, so that a 
> document needed to get at least 99.9999% or so to be returned with 100%. 

If it's going to be a threshold, we should pick one appropriate for the
rounding errors that can happen rather than something arbitrary.  Or else
ensure that "matches all documents" is handled specially such that
rounding isn't an issue (which I thought already happened as a short cut
actually, so I'm not sure how we can get rounding errors here - excess
precision on x86 maybe?)

If we're going to round, we need to fix how the percentage cut-off is
handled by the matches to account for the 0.5% shift.

Do you have a repeatable testcase where this happens?

Cheers,
    Olly



More information about the Xapian-devel mailing list