[Xapian-discuss] Lucene ranking formula
James Aylett
james-xapian at tartarus.org
Wed Nov 10 11:29:24 GMT 2004
No mention of where it came from, but:
----------------------------------------------------------------------
For the record, Lucene's scoring algorithm is, roughly:
score_d = sum_t( tf_q * idf_t / norm_q * tf_d * idf_t / norm_d_t)
where:
score_d : score for document d
sum_t : sum for all terms t
tf_q : the square root of the frequency of t in the query
tf_d : the square root of the frequency of t in d
idf_t : log(numDocs/docFreq_t+1) + 1.0
numDocs : number of documents in index
docFreq_t : number of documents containing t
norm_q : sqrt(sum_t((tf_q*idf_t)^2))
norm_d_t : square root of number of tokens in d in the same field
as t
(I hope that's right!)
[Doug later added...]
Make that:
score_d = sum_t(tf_q * idf_t / norm_q * tf_d * idf_t / norm_d_t *
boost_t) * coord_q_d
where
boost_t : the user-specified boost for term t
coord_q_d : number of terms in both query and document / number of
terms in query
The coordination factor gives an AND-like boost to documents that
contain, e.g., all three terms in a three word query over those that
contain just two of the words.
----------------------------------------------------------------------
<http://lucene.sourceforge.net/cgi-bin/faq/faqmanager.cgi?file=chapter.search&toc=faq#q31>
--
/--------------------------------------------------------------------------\
James Aylett xapian.org
james at tartarus.org uncertaintydivision.org
More information about the Xapian-discuss
mailing list