Weighting Schemes: Evaluation results

Vivek Pal vivekpal.dtu at gmail.com
Sun Jul 24 12:17:15 BST 2016


Hi all,

I have evaluated new weighting schemes along with their existing
counterparts in Xapian to compare and see which one does better job.
Also, I have put together all the results files for easy access here:
https://github.com/ivmarkp/xapian-evaluation/tree/evaluation/run
and a README for getting started with xapian-evaluation module. Hopefully,
it might be of help to those who are new to evaluating weighting schemes in
Xapian :)

Comparing the MAP to access the retrieval effectiveness, some interesting
results have emerged as follows:

1. BM25+ : 0.100415 and BM25: 0.101771

BM25 does a slightly better job here. My guess is that BM25+ is falling
short because may be we lack very long documents in the data-set
collection.
Also, I'm thinking of revisiting the PR of BM25+ patch and cross-check it
with original BM25+ formula to spot any mistake in the implementation
formula if any.
Let me know of any other ideas that can possibly improve the performance of
BM25+.

2. PL2+:  0.0781953 and PL2: 0.0752646

Here, PL2+ indeed does a better job at retrieving relevant documents
although by a small margin.
I believe this should produce much better results at scale in practical
use. At this point, we might want to consider replacing PL2 with PL2+ in
Xapian to put it in practical use.

3. LMWeight_Dirplus: 0.100168 and LMWeight_Dir: 0.100168

These results are for LMWeight with smoothing Dir and Dirplus respectively.
Interestingly identical results.
Ideally, LMWeight_dirplus should perform better and I'm having similar
thoughts for it as for BM25+ and BM25 results.

Last addtion in weighting schemes (Piv+ normalization) is a work in
progress.
I've been sick these past few days and so things moved slowly. Will be
completing its implementation in the upcoming week along with the
evaluation.

Regards,
Vivek
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20160724/a30e76b2/attachment.html>


More information about the Xapian-devel mailing list