[Xapian-discuss] document weight

Olly Betts olly at survex.com
Tue May 24 14:42:38 BST 2005


On Tue, May 24, 2005 at 03:34:11AM +0000, Sabrina Shen wrote:
> (1) From documentation I know Xapian employs BM25 to estimate weights
> of query terms and documents.  But how does it ensure that the final
> weight for a record scales from 0 to 1?  It seems to me that
> Xapian::BM25Weight::get_sumpart could become larger than 1? Did I
> misunderstand anything?

The BM25 weight *can* be larger than 1.  However that doesn't mean we
can't produce a percentage score between 0 and 100...

If the highest ranking document matches all the terms in the query, then
we simply divide all weights by this and multiply by 100% to give the
percentage score.

If the highest ranking document doesn't match all terms, we simply
multiply by less than 100%.  The score to multiply by is determined
by looking at which terms match.

> (2) Also for a quick search with "class OR list", I thought I would get the
> following three records the same weight "100%". But I was wrong. What
> can be the factors influencing this?

Those factors are the within document frequencies (wdfs) of the two
terms, and the document lengths.

And it seems to be working here - the top matching document is the class
list for the whole sources.

Cheers,
    Olly



More information about the Xapian-discuss mailing list