[Xapian-discuss] Search performance issues and profiling/debugging

Ron Kass ron at pidgintech.com
Fri Nov 2 11:27:49 GMT 2007


I'm not sure whether or not SegFault was limited to BM25 or if it 
happened without changing the basic weight scheme.
I think it might have happened even without changing it. But once we get 
to the bottom of the BM25 bug, we can stress test things again and see.
One thing which was I think was somewhat clear was that with BM25 
parameters changed, searching was slower.
However, I think part of the problem was with the database files as 
well, as we received SegFault on specific databases combination and not 
with others.. but this could be a coincidence.
Let me know if you managed to hunt down the rouge parameter and we can 
test it again to see the effect.
The main problem still remains extremely long search time of certain 
search terms. Hope its related...

Regards,
Ron


Olly Betts wrote:

> On Thu, Oct 25, 2007 at 08:27:33PM +0200, Ron Kass wrote:
>   
>> By the way, this is the weight scheme we used.
>> my $k1 = 1;             #governs the importance of within document 
>> frequency. Must be >= 0. 0 means ignore wdf. Default is 1.
>> my $k2 = 25;            #compensation factor for the high wdf values in 
>> large documents. Must be >= 0. 0 means no compensation. Default is 0.
>> my $k3 = 1;             #governs the importance of within query 
>> frequency. Must be >= 0. 0 means ignore wqf. Default is 1.
>> my $b = 0.01;           #Relative importance of within document 
>> frequency and document length. Must be >= 0 and <= 1. Default is 0.5.
>> my $min_normlen = 0.5;  #specifies a cutoff on the minimum value that 
>> can be used for a normalised document length - smaller values will be 
>> forced up to this cutoff. This prevents very small documents getting a 
>> huge bonus weight. Default is 0.5.
>> [...]
>> Then we went back to the old server.. Same speed as before (0.9-1.0sec 
>> per search) and this time estimates are stable. So, weight scheme is the 
>> cause of the inaccurate estimates. Why?
>>     
>
> I've made some progress here.  It looks like there's a bug in BM25Weight
> where one of the statistics isn't being set correctly, but by default
> this doesn't matter since k2 is 0.  If k2 is set to non-zero (as you've
> done) then this manifests as an unpredicitable factor in the weights.
>
> I've not yet tracked down where this value comes from, but it shouldn't
> take long now I've got it happening in front of me with a four line
> testcase!
>
> Was the segfaulting case also using BM25 with non-zero k2?
>
> Cheers,
>     Olly
>   


More information about the Xapian-discuss mailing list