Weighting Schemes: Evaluation results
Vivek Pal
vivekpal.dtu at gmail.com
Sun Aug 7 19:02:27 BST 2016
Hi,
Evaluation of pivoted normalization ("PPP") of tf-idf weighting scheme is
also complete now. I have also evaluated the default tf-idf normalization
("ntn") and other normalizations combinations involving pivoted
normalization in wdfn, idfn and wtn component as "Pxx", "xPx" and "xxP"
normalization strings respectively to have a clear idea about which one
does better job of retrieving relevant documents.
All results of evaluation runs can be easily accessed here:
https://gist.github.com/ivmarkp
Comparing the MAP of "PPP" with that of "ntn" normalization, we get results
as follows:
PPP : 0.0607107
ntn : 0.109525
Clearly, the default normalization does a better job here than pivoted
normalization but since we intended to have support for pivoted
normalization in Xapian rather making a replacement of default
normalization with pivoted normalization, I think this comparison may not
come as a big surprise.
Similarly, the MAP of Ptn, nPn and ntP which represent "Pxx", "xPx" and
"xxP" normalization strings respectively are as follows:
ntP: 0.0747668
nPn: 0.0676789
Ptn: 0.11379
Interestingly, Ptn normalization does fairly good job than all other
normalizations and the default normalization ("ntn") as well. So, I think
it can be recommended for applications based on news corpus to definitely
use Ptn normalization if exploring options beyond default tf-idf
normalization.
As a small side note -- now I'm planning to take up additional tasks we
were looking to work on in the end but before that I was wondering if this
is the right time to complete the documentation part of BM25+, PL2+, Dir+
and Piv+ weighting schemes and also if PRs for these weighting schemes can
be merged upstream finally? Please let me know if there are any loose ends
that might need some work before PRs can be merged.
Regards,
Vivek
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20160807/a357e2d2/attachment.html>
More information about the Xapian-devel
mailing list