[Xapian-devel] Handling Negative value due to logarithm of probabilities.

Gaurav Arora gauravarora.daiict at gmail.com
Sat Apr 28 21:36:31 BST 2012


>
> So if we set:
>
> K = doc_length_upper_bound
>
> we can ensure that K.Pi >= 1 and not have to worry about clamping the
> log to be non-negative.


> So it looks like we can actually pick a non-insane K which will ensure
> we never clamp.  Maybe that would be inefficient though, and actually a
> smaller K would work equally well for retrieval, yet be faster.
>
> Yes, I think this will serve as very good point to start with value of *K* .Later
as i am also planning to write accuracy test for checking the accuracy of
weighting scheme with value in literature,then i think running accuracy
testing with various values of *K *see how it affected retrieval
performance and run-time. We can find value of K which is more efficient
than * K = doc_length_upper_bound * and not also compromise performance.

I think even after having this it would be good idea to allow user to
specify value of *K.* keeping the value found by us as default.

I was thinking about scheme and had thought :

If we consider two documents with *document 1* matching(contains) 3 query
terms and *document 2* matching(contains) 2 query terms then

virtual weight function would be equivalent to:

Wdocument1` = Wdocument1(orig) + 2log(K)  = log(K.P1) + log(K.P2)

Wdocument2` = Wdocument2(orig) + 3log(K)  = log(K.P1) + log(K.P2) +
log(K.P3)

i hope this would happen since for *document 1* matcher would call weight
class 2 times and for *document 2* it would call Weight class 3 times.

Since *document 2* have more number of matching terms it should probably be
ranked higher but in case wdf for terms present in *document 1* were quite
higher and they *document 1* could actually over come *document 2* then
this ranking wont be appropriate as it will still rank *document 2* higher
due to large *K * value being added.


Thanks,

-- 
with regards
Gaurav A.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20120429/e09e4dd5/attachment.htm>


More information about the Xapian-devel mailing list