[Xapian-discuss] Round up Relevance Value while sorting result

Olly Betts olly at survex.com
Wed Feb 7 06:45:40 GMT 2007


On Mon, Jan 29, 2007 at 10:48:28PM +0800, Andrey wrote:
> I like to round up the relevance value while sorting the result,
> to achive something like "sort by <relevance in %>"
> 
> then, i can have "sort_by_<relevance in %>_then_value(date)"
> 
> is it somthing about the weighting_scheme?

No.  The weighting scheme determines the weight contributed to each
document by each term.  These are summed and then once all the
documents' weights are known, the percentages are calculated from the
weight of each document relative to the highest weighted document we've
seen.

If all terms in the query match the highest weighted document, this
document would get 100%.  If not all terms match, then the percentages
are scaled down by a factor determined from which terms do match it.

The actual values of the percentages are somewhat meaningless in fact
- the probabilistic weighting formulae come from Bayes theorem
originally, but the probabilities are transformed in non-linear
ways in the derivations, so there's not a mathematical justification
for saying a 80% document is twice as likely to be useful as a 40% one
(but it should be better).  However, the percentage can still a useful
thing to display because it gives the user some measure they can easily
appreciate.  You could also use a * rating or a set of icons with an
obvious ordering.

So I don't think you could write a weighting scheme which produced
rounded integer percentage weights.  But you can write a weighting
scheme which gives more documents the same weight as each other so they
then get ordered by date.  In fact TradWeight should do, or you can
adjust the parameters used by BM25Weight.

Cheers,
    Olly



More information about the Xapian-discuss mailing list