<div dir="ltr">> <span style="font-size:12.8px">It'd be nice if we knew what sort of corpus PPP would be good</span><br style="font-size:12.8px"><span style="font-size:12.8px">> for; is there something suggestive in the literature?</span><br style="font-size:12.8px"><div><span style="font-size:12.8px"><br></span></div><div><span style="font-size:12.8px"><font color="#000000">There isn't anything specifics mentioned for Piv+ similar to what we</font></span></div><div><span style="font-size:12.8px"><font color="#000000">had for BM25+ previously but I'm positive that corpuses used are  </font></span></div><div><div><font color="#000000"><span style="font-size:12.8px">four TREC collections: WT2G, WT10G, Ter</span><span style="font-size:12.8px">abyte, and Robust04, which </span></font></div><div><font color="#000000"><span style="font-size:12.8px">basically represent different sizes and </span><span style="font-size:12.8px">genre of text collections.</span></font></div></div><div><span style="font-size:12.8px"><br></span></div><div><span style="font-size:12.8px">> </span><span style="font-size:12.8px">Trying to complete the documentation I think is the right priority.</span></div><div><span style="font-size:12.8px"><br></span></div><div><span style="font-size:12.8px"><font color="#000000">Okay, I'm on it -- will soon open PRs for the same.</font></span></div><div><span style="font-size:12.8px"><br></span></div><div><span style="font-size:12.8px">> </span><span style="font-size:12.8px">Assuming you've addressed all the earlier comments (which I think you</span></div><span style="font-size:12.8px">> have), I think it's down to us at this point :-)</span><div><span style="font-size:12.8px"><br></span></div><div><span style="font-size:12.8px"><font color="#000000">Thanks, that's great. Just to make sure everything is in place, I'll take a quick</font></span></div><div><span style="font-size:12.8px"><font color="#000000">glance over things again.</font></span></div><div><span style="font-size:12.8px"><br></span></div><div><span style="font-size:12.8px">> </span><span style="font-size:12.8px">I don't see any significant hold ups other than</span></div><span style="font-size:12.8px">> that, although I'm not sure (because I haven't had to deal with it</span><br style="font-size:12.8px"><span style="font-size:12.8px">> before) in what way we need to change the ABI number for these</span><br style="font-size:12.8px"><span style="font-size:12.8px">> changes.</span><div><span style="font-size:12.8px"><br></span></div><div><span style="font-size:12.8px"><font color="#000000">I think I have little to add here. Although, I can recall that you had mentioned</font></span></div><div><font color="#000000"><span style="font-size:12.8px">in the mid-term meeting that </span><span style="font-size:12.8px">these changes should go </span><span style="font-size:12.8px">into the 1.5 series instead</span></font></div><div><font color="#000000"><span style="font-size:12.8px">of a brand new thing </span><span style="font-size:12.8px">in the 1.4 release or </span><span style="font-size:12.8px">something </span><span style="font-size:12.8px">similar if I remember correctly. :)</span></font></div><div><font color="#000000"><span style="font-size:12.8px"><br></span></font></div><div><font color="#000000"><span style="font-size:12.8px">Actually, as now that the submission week is nearing, I was wondering what best fits the list </span></font></div><div><font color="#000000"><span style="white-space:pre-wrap">of different pieces of project work that have been merged or should it be fine to list them</span></font></div><div><font color="#000000"><span style="white-space:pre-wrap">as work that hasn't been merged ?</span></font></div><div><font color="#000000"><span style="white-space:pre-wrap"><br></span></font></div><div><font color="#000000"><span style="white-space:pre-wrap">Thanks,</span></font></div><div><font color="#000000"><span style="white-space:pre-wrap">Vivek</span></font></div><div><font color="#000000"><span style="white-space:pre-wrap"><br></span></font></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Aug 8, 2016 at 11:37 PM, James Aylett <span dir="ltr"><<a href="mailto:james-xapian@tartarus.org" target="_blank">james-xapian@tartarus.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On Sun, Aug 07, 2016 at 11:32:27PM +0530, Vivek Pal wrote:<br>

<br>

> All results of evaluation runs can be easily accessed here:<br>

> <a href="https://gist.github.com/ivmarkp" rel="noreferrer" target="_blank">https://gist.github.com/<wbr>ivmarkp</a><br>

<br>

</span>Hey, that's great!<br>

<span class=""><br>

> Comparing the MAP of "PPP" with that of "ntn" normalization, we get results<br>

> as follows:<br>

><br>

> PPP : 0.0607107<br>

> ntn : 0.109525<br>

><br>

> Clearly, the default normalization does a better job here than pivoted<br>

> normalization but since we intended to have support for pivoted<br>

> normalization in Xapian rather making a replacement of default<br>

> normalization with pivoted normalization, I think this comparison may not<br>

> come as a big surprise.<br>

<br>

</span>Hmm. It'd be nice if we knew what sort of corpus PPP would be good<br>

for; is there something suggestive in the literature?<br>

<span class=""><br>

> Similarly, the MAP of Ptn, nPn and ntP which represent "Pxx", "xPx" and<br>

> "xxP" normalization strings respectively are as follows:<br>

><br>

> ntP: 0.0747668<br>

> nPn: 0.0676789<br>

> Ptn: 0.11379<br>

><br>

> Interestingly, Ptn normalization does fairly good job than all other<br>

> normalizations and the default normalization ("ntn") as well. So, I think<br>

> it can be recommended for applications based on news corpus to definitely<br>

> use Ptn normalization if exploring options beyond default tf-idf<br>

> normalization.<br>

<br>

</span>Sounds good!<br>

<span class=""><br>

> As a small side note -- now I'm planning to take up additional tasks<br>

> we were looking to work on in the end but before that I was<br>

> wondering if this is the right time to complete the documentation<br>

> part of BM25+, PL2+, Dir+ and Piv+ weighting schemes<br>

<br>

</span>Trying to complete the documentation I think is the right priority.<br>

<span class=""><br>

> and also if PRs for these weighting schemes can be merged upstream<br>

> finally?  Please let me know if there are any loose ends that might<br>

> need some work before PRs can be merged.<br>

<br>

</span>Assuming you've addressed all the earlier comments (which I think you<br>

have), I think it's down to us at this point :-)<br>

<br>

I've been holding back on merging largely because I have a host of<br>

other things going on. I don't see any significant hold ups other than<br>

that, although I'm not sure (because I haven't had to deal with it<br>

before) in what way we need to change the ABI number for these<br>

changes. Not sure if Olly has been following this work closely enough<br>

to be able to comment, or if we're going to have to find some time to<br>

sit down and figure it out (along with whether we merge these changes<br>

into 1.4.x).<br>

<div class="HOEnZb"><div class="h5"><br>

J<br>

<br>

--<br>

  James Aylett, occasional trouble-maker<br>

  <a href="http://xapian.org" rel="noreferrer" target="_blank">xapian.org</a><br>

<br>

</div></div></blockquote></div><br></div>