[Xapian-discuss] Re: document weight

Sabrina Shen hm2shen at yahoo.com
Thu May 26 01:29:03 BST 2005


Thanks. Below are my responses:

Olly Betts <olly <at> survex.com> writes:
 
> The BM25 weight *can* be larger than 1.  However that doesn't mean we
> can't produce a percentage score between 0 and 100...
> 
> If the highest ranking document matches all the terms in the query, then
> we simply divide all weights by this and multiply by 100% to give the
> percentage score.
Yes, it makes sense.

> If the highest ranking document doesn't match all terms, we simply
> multiply by less than 100%.  The score to multiply by is determined
> by looking at which terms match.
I'm a little confused. More specifically, what do you mean be "by looking
at which terms match"? For example, if we search with terms t1, t2, t3, 
a document D1 contains t1 and t2, and we get term
 weigt tw1 for D1 with t1, tw2 for D1 with t2. Using BM25, 
finally we get document weight DW1. Similarly, a document D2 contains t2
 and t3, and we get term weigt tw2' for D2 with t2, tw3' for D2
 with t3. Using BM25, finally we get document weight DW2. 
How could we estimate final percent score?

Is  MSet::convert_to_percent where I should look into?

Thanks.

Sabrina 







More information about the Xapian-discuss mailing list