[Xapian-discuss] document weight
Olly Betts
olly at survex.com
Tue May 24 14:42:38 BST 2005
On Tue, May 24, 2005 at 03:34:11AM +0000, Sabrina Shen wrote:
> (1) From documentation I know Xapian employs BM25 to estimate weights
> of query terms and documents. But how does it ensure that the final
> weight for a record scales from 0 to 1? It seems to me that
> Xapian::BM25Weight::get_sumpart could become larger than 1? Did I
> misunderstand anything?
The BM25 weight *can* be larger than 1. However that doesn't mean we
can't produce a percentage score between 0 and 100...
If the highest ranking document matches all the terms in the query, then
we simply divide all weights by this and multiply by 100% to give the
percentage score.
If the highest ranking document doesn't match all terms, we simply
multiply by less than 100%. The score to multiply by is determined
by looking at which terms match.
> (2) Also for a quick search with "class OR list", I thought I would get the
> following three records the same weight "100%". But I was wrong. What
> can be the factors influencing this?
Those factors are the within document frequencies (wdfs) of the two
terms, and the document lengths.
And it seems to be working here - the top matching document is the class
list for the whole sources.
Cheers,
Olly
More information about the Xapian-discuss
mailing list