[Xapian-discuss] floating-point issues with set_sort_by_relevance_then_value? (1.2.3, BM25 k1=0)

Olly Betts olly at survex.com
Mon Nov 1 10:58:49 GMT 2010


On Mon, Nov 01, 2010 at 01:59:42AM +0100, Marinos Yannikos wrote:
> This apparently prevents floating point precision issues in the last line 
> of get_sumpart() [which calculates termweight * wdf_double * 1 / 
> wdf_double].

Yes, for some values of wdf_double and termweight, this doesn't give
exactly termweight.  We should do the division, and scale termweight by
the result.

I've reproduced this issue and I'm currently working on a fix.

> It also speeds up my case slightly. ;-)

How much is "slightly"?  Or did you just mean it's doing less work,
rather than that there's a measurable speed-up.

> In order to prevent more such issues, it might be a good idea to round 
> weights to a few fractional digits (10 should be enough) before using 
> them as sort keys.

Rounding isn't a magic solution to such issues, and explicitly rounding
all the weights is extra work.  I think it's better to focus on getting
the calculations right rather than trying to disguise any problems.

Cheers,
    Olly



More information about the Xapian-discuss mailing list