<div dir="ltr">> <span style="font-size:12.8px">It'd be nice if we knew what sort of corpus PPP would be good</span><br style="font-size:12.8px"><span style="font-size:12.8px">> for; is there something suggestive in the literature?</span><br style="font-size:12.8px"><div><span style="font-size:12.8px"><br></span></div><div><span style="font-size:12.8px"><font color="#000000">There isn't anything specifics mentioned for Piv+ similar to what we</font></span></div><div><span style="font-size:12.8px"><font color="#000000">had for BM25+ previously but I'm positive that corpuses used are </font></span></div><div><div><font color="#000000"><span style="font-size:12.8px">four TREC collections: WT2G, WT10G, Ter</span><span style="font-size:12.8px">abyte, and Robust04, which </span></font></div><div><font color="#000000"><span style="font-size:12.8px">basically represent different sizes and </span><span style="font-size:12.8px">genre of text collections.</span></font></div></div><div><span style="font-size:12.8px"><br></span></div><div><span style="font-size:12.8px">> </span><span style="font-size:12.8px">Trying to complete the documentation I think is the right priority.</span></div><div><span style="font-size:12.8px"><br></span></div><div><span style="font-size:12.8px"><font color="#000000">Okay, I'm on it -- will soon open PRs for the same.</font></span></div><div><span style="font-size:12.8px"><br></span></div><div><span style="font-size:12.8px">> </span><span style="font-size:12.8px">Assuming you've addressed all the earlier comments (which I think you</span></div><span style="font-size:12.8px">> have), I think it's down to us at this point :-)</span><div><span style="font-size:12.8px"><br></span></div><div><span style="font-size:12.8px"><font color="#000000">Thanks, that's great. Just to make sure everything is in place, I'll take a quick</font></span></div><div><span style="font-size:12.8px"><font color="#000000">glance over things again.</font></span></div><div><span style="font-size:12.8px"><br></span></div><div><span style="font-size:12.8px">> </span><span style="font-size:12.8px">I don't see any significant hold ups other than</span></div><span style="font-size:12.8px">> that, although I'm not sure (because I haven't had to deal with it</span><br style="font-size:12.8px"><span style="font-size:12.8px">> before) in what way we need to change the ABI number for these</span><br style="font-size:12.8px"><span style="font-size:12.8px">> changes.</span><div><span style="font-size:12.8px"><br></span></div><div><span style="font-size:12.8px"><font color="#000000">I think I have little to add here. Although, I can recall that you had mentioned</font></span></div><div><font color="#000000"><span style="font-size:12.8px">in the mid-term meeting that </span><span style="font-size:12.8px">these changes should go </span><span style="font-size:12.8px">into the 1.5 series instead</span></font></div><div><font color="#000000"><span style="font-size:12.8px">of a brand new thing </span><span style="font-size:12.8px">in the 1.4 release or </span><span style="font-size:12.8px">something </span><span style="font-size:12.8px">similar if I remember correctly. :)</span></font></div><div><font color="#000000"><span style="font-size:12.8px"><br></span></font></div><div><font color="#000000"><span style="font-size:12.8px">Actually, as now that the submission week is nearing, I was wondering what best fits the list </span></font></div><div><font color="#000000"><span style="white-space:pre-wrap">of different pieces of project work that have been merged or should it be fine to list them</span></font></div><div><font color="#000000"><span style="white-space:pre-wrap">as work that hasn't been merged ?</span></font></div><div><font color="#000000"><span style="white-space:pre-wrap"><br></span></font></div><div><font color="#000000"><span style="white-space:pre-wrap">Thanks,</span></font></div><div><font color="#000000"><span style="white-space:pre-wrap">Vivek</span></font></div><div><font color="#000000"><span style="white-space:pre-wrap"><br></span></font></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Aug 8, 2016 at 11:37 PM, James Aylett <span dir="ltr"><<a href="mailto:james-xapian@tartarus.org" target="_blank">james-xapian@tartarus.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On Sun, Aug 07, 2016 at 11:32:27PM +0530, Vivek Pal wrote:<br>
<br>
> All results of evaluation runs can be easily accessed here:<br>
> <a href="https://gist.github.com/ivmarkp" rel="noreferrer" target="_blank">https://gist.github.com/<wbr>ivmarkp</a><br>
<br>
</span>Hey, that's great!<br>
<span class=""><br>
> Comparing the MAP of "PPP" with that of "ntn" normalization, we get results<br>
> as follows:<br>
><br>
> PPP : 0.0607107<br>
> ntn : 0.109525<br>
><br>
> Clearly, the default normalization does a better job here than pivoted<br>
> normalization but since we intended to have support for pivoted<br>
> normalization in Xapian rather making a replacement of default<br>
> normalization with pivoted normalization, I think this comparison may not<br>
> come as a big surprise.<br>
<br>
</span>Hmm. It'd be nice if we knew what sort of corpus PPP would be good<br>
for; is there something suggestive in the literature?<br>
<span class=""><br>
> Similarly, the MAP of Ptn, nPn and ntP which represent "Pxx", "xPx" and<br>
> "xxP" normalization strings respectively are as follows:<br>
><br>
> ntP: 0.0747668<br>
> nPn: 0.0676789<br>
> Ptn: 0.11379<br>
><br>
> Interestingly, Ptn normalization does fairly good job than all other<br>
> normalizations and the default normalization ("ntn") as well. So, I think<br>
> it can be recommended for applications based on news corpus to definitely<br>
> use Ptn normalization if exploring options beyond default tf-idf<br>
> normalization.<br>
<br>
</span>Sounds good!<br>
<span class=""><br>
> As a small side note -- now I'm planning to take up additional tasks<br>
> we were looking to work on in the end but before that I was<br>
> wondering if this is the right time to complete the documentation<br>
> part of BM25+, PL2+, Dir+ and Piv+ weighting schemes<br>
<br>
</span>Trying to complete the documentation I think is the right priority.<br>
<span class=""><br>
> and also if PRs for these weighting schemes can be merged upstream<br>
> finally? Please let me know if there are any loose ends that might<br>
> need some work before PRs can be merged.<br>
<br>
</span>Assuming you've addressed all the earlier comments (which I think you<br>
have), I think it's down to us at this point :-)<br>
<br>
I've been holding back on merging largely because I have a host of<br>
other things going on. I don't see any significant hold ups other than<br>
that, although I'm not sure (because I haven't had to deal with it<br>
before) in what way we need to change the ABI number for these<br>
changes. Not sure if Olly has been following this work closely enough<br>
to be able to comment, or if we're going to have to find some time to<br>
sit down and figure it out (along with whether we merge these changes<br>
into 1.4.x).<br>
<div class="HOEnZb"><div class="h5"><br>
J<br>
<br>
--<br>
James Aylett, occasional trouble-maker<br>
<a href="http://xapian.org" rel="noreferrer" target="_blank">xapian.org</a><br>
<br>
</div></div></blockquote></div><br></div>