[Xapian-discuss] Similarity Measures

Gavin Mendel-Gleason plywn at yahoo.com
Tue Jun 27 17:21:47 BST 2006


I'm currently trying to implement a similarity measure
for xapian.  Ideally I'd like to be able to calculate
the following: 

for document i, and document j

s_ij =  a_ij / ( L_i + L_j + a_ij) 

Where L_i is the number of terms in the document i
and a_ij is:

a_ij = Sum[ t_ik t_jk ] 

Where t_ij is 1 if term "j" occurs in document i.

>From looking at the source code for weights it appears
that the sum should be cut up into peices that can be
calculated incrementally.  Is it possible to calculate
this value within the current weight framework. 

Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 

More information about the Xapian-discuss mailing list