[Xapian-discuss] scoring question

Olly Betts olly at survex.com
Wed Mar 21 18:48:13 GMT 2007

On Wed, Mar 21, 2007 at 06:41:11PM +0000, Richard Boulton wrote:
> Olly Betts wrote:
> >I've wondered before if wdf should be forced to always be at least one,
> >but it's perhaps useful to be able to add boolean terms without
> >affecting the document length.
> I wonder if the BM25 weight class should clip it to a lower bound of 1, 
> but it should be allowed to have a value of 0 in the database - such 
> that the document length is not affected.

Interesting idea.  Probably better to just always clip the value to be
at least 1 for any weighting class if we do this.

It might invalidate certain assumptions which we might currently be
making (e.g. wdf <= doclen would no longer necessarily be true).  It's
been a while since I looked at the formulae so perhaps none of the
assumptions we actually make would be affected.  I'd like to start
storing bounds on maximum and minimum values for wdf and other things at
which point we can probably eliminate any problematic assumptions


