[Xapian-discuss] Search performance issues and profiling/debugging

Olly Betts olly at survex.com
Fri Nov 2 06:06:07 GMT 2007


On Thu, Oct 25, 2007 at 08:27:33PM +0200, Ron Kass wrote:
> By the way, this is the weight scheme we used.
> my $k1 = 1;             #governs the importance of within document 
> frequency. Must be >= 0. 0 means ignore wdf. Default is 1.
> my $k2 = 25;            #compensation factor for the high wdf values in 
> large documents. Must be >= 0. 0 means no compensation. Default is 0.
> my $k3 = 1;             #governs the importance of within query 
> frequency. Must be >= 0. 0 means ignore wqf. Default is 1.
> my $b = 0.01;           #Relative importance of within document 
> frequency and document length. Must be >= 0 and <= 1. Default is 0.5.
> my $min_normlen = 0.5;  #specifies a cutoff on the minimum value that 
> can be used for a normalised document length - smaller values will be 
> forced up to this cutoff. This prevents very small documents getting a 
> huge bonus weight. Default is 0.5.
> [...]
> Then we went back to the old server.. Same speed as before (0.9-1.0sec 
> per search) and this time estimates are stable. So, weight scheme is the 
> cause of the inaccurate estimates. Why?

I've made some progress here.  It looks like there's a bug in BM25Weight
where one of the statistics isn't being set correctly, but by default
this doesn't matter since k2 is 0.  If k2 is set to non-zero (as you've
done) then this manifests as an unpredicitable factor in the weights.

I've not yet tracked down where this value comes from, but it shouldn't
take long now I've got it happening in front of me with a four line
testcase!

Was the segfaulting case also using BM25 with non-zero k2?

Cheers,
    Olly



More information about the Xapian-discuss mailing list