Weighting Schemes: Evaluation results

James Aylett james-xapian at tartarus.org
Mon Aug 8 19:07:27 BST 2016


On Sun, Aug 07, 2016 at 11:32:27PM +0530, Vivek Pal wrote:

> All results of evaluation runs can be easily accessed here:
> https://gist.github.com/ivmarkp

Hey, that's great!

> Comparing the MAP of "PPP" with that of "ntn" normalization, we get results
> as follows:
> 
> PPP : 0.0607107
> ntn : 0.109525
> 
> Clearly, the default normalization does a better job here than pivoted
> normalization but since we intended to have support for pivoted
> normalization in Xapian rather making a replacement of default
> normalization with pivoted normalization, I think this comparison may not
> come as a big surprise.

Hmm. It'd be nice if we knew what sort of corpus PPP would be good
for; is there something suggestive in the literature?

> Similarly, the MAP of Ptn, nPn and ntP which represent "Pxx", "xPx" and
> "xxP" normalization strings respectively are as follows:
> 
> ntP: 0.0747668
> nPn: 0.0676789
> Ptn: 0.11379
> 
> Interestingly, Ptn normalization does fairly good job than all other
> normalizations and the default normalization ("ntn") as well. So, I think
> it can be recommended for applications based on news corpus to definitely
> use Ptn normalization if exploring options beyond default tf-idf
> normalization.

Sounds good!

> As a small side note -- now I'm planning to take up additional tasks
> we were looking to work on in the end but before that I was
> wondering if this is the right time to complete the documentation
> part of BM25+, PL2+, Dir+ and Piv+ weighting schemes

Trying to complete the documentation I think is the right priority.

> and also if PRs for these weighting schemes can be merged upstream
> finally?  Please let me know if there are any loose ends that might
> need some work before PRs can be merged.

Assuming you've addressed all the earlier comments (which I think you
have), I think it's down to us at this point :-)

I've been holding back on merging largely because I have a host of
other things going on. I don't see any significant hold ups other than
that, although I'm not sure (because I haven't had to deal with it
before) in what way we need to change the ABI number for these
changes. Not sure if Olly has been following this work closely enough
to be able to comment, or if we're going to have to find some time to
sit down and figure it out (along with whether we merge these changes
into 1.4.x).

J

-- 
  James Aylett, occasional trouble-maker
  xapian.org



More information about the Xapian-devel mailing list