[Xapian-devel] Handling Negative value due to logarithm of probabilities.
Gaurav Arora
gauravarora.daiict at gmail.com
Sat Apr 28 21:36:31 BST 2012
>
> So if we set:
>
> K = doc_length_upper_bound
>
> we can ensure that K.Pi >= 1 and not have to worry about clamping the
> log to be non-negative.
> So it looks like we can actually pick a non-insane K which will ensure
> we never clamp. Maybe that would be inefficient though, and actually a
> smaller K would work equally well for retrieval, yet be faster.
>
> Yes, I think this will serve as very good point to start with value of *K* .Later
as i am also planning to write accuracy test for checking the accuracy of
weighting scheme with value in literature,then i think running accuracy
testing with various values of *K *see how it affected retrieval
performance and run-time. We can find value of K which is more efficient
than * K = doc_length_upper_bound * and not also compromise performance.
I think even after having this it would be good idea to allow user to
specify value of *K.* keeping the value found by us as default.
I was thinking about scheme and had thought :
If we consider two documents with *document 1* matching(contains) 3 query
terms and *document 2* matching(contains) 2 query terms then
virtual weight function would be equivalent to:
Wdocument1` = Wdocument1(orig) + 2log(K) = log(K.P1) + log(K.P2)
Wdocument2` = Wdocument2(orig) + 3log(K) = log(K.P1) + log(K.P2) +
log(K.P3)
i hope this would happen since for *document 1* matcher would call weight
class 2 times and for *document 2* it would call Weight class 3 times.
Since *document 2* have more number of matching terms it should probably be
ranked higher but in case wdf for terms present in *document 1* were quite
higher and they *document 1* could actually over come *document 2* then
this ranking wont be appropriate as it will still rank *document 2* higher
due to large *K * value being added.
Thanks,
--
with regards
Gaurav A.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20120429/e09e4dd5/attachment.htm>
More information about the Xapian-devel
mailing list