<div dir="ltr"><div><div><div>For unsupervised approaches like BM25 this approach works well but letor does not need special weighting for title in this form as it itself assigns weights to title features separately. <br><br>
</div>But I see your concern it would be a problem when BM25 is used on the index with this setup. Hence its preferable to take a note of this uplift in title weight for xapian-letor and normalize it everywhere calculating the statistics.<br>
<br></div>Cheers,<br></div>Parth.<br></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Thu, Mar 20, 2014 at 2:35 AM, Olly Betts <span dir="ltr"><<a href="mailto:olly@survex.com" target="_blank">olly@survex.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="">On Mon, Mar 17, 2014 at 09:07:29PM +0100, Parth Gupta wrote:<br>
> Wouldn't setting the weight of terms in title back to normal (e.g. 5 to 1)<br>
> by below line, automatically adjust the wdfs and field lengths?<br>
><br>
> indexer.index_text(title, 5, "S"); -> indexer.index_text(title, 1, "S");<br>
><br>
> if it does not then we should include that part in the patch too. I like to<br>
> create a patch for xapian-letor for resolving common code of xapian.<br>
<br>
</div>I'm not sure I follow.<br>
<br>
The reason we use 5 here is that the page title is that matching terms<br>
in the title are usually a good indicator of a page that should be<br>
ranked highly for a search (note omindex is not usually working in a<br>
domain where evil SEOs are trying to distort the rankings).<br>
<br>
If we simply change 5 to 1 here, then the title won't be given any extra<br>
consideration, which would be a regression in this area.<br>
<br>
Cheers,<br>
Olly<br>
</blockquote></div><br></div>