[Xapian-devel] [GSOC 2014] Indexing INEX dataset

Parth Gupta pargup8 at gmail.com
Sat Mar 22 08:22:26 GMT 2014


For unsupervised approaches like BM25 this approach works well but letor
does not need special weighting for title in this form as it itself assigns
weights to title features separately.

But I see your concern it would be a problem when BM25 is used on the index
with this setup. Hence its preferable to take a note of this uplift in
title weight for xapian-letor and normalize it everywhere calculating the
statistics.

Cheers,
Parth.


On Thu, Mar 20, 2014 at 2:35 AM, Olly Betts <olly at survex.com> wrote:

> On Mon, Mar 17, 2014 at 09:07:29PM +0100, Parth Gupta wrote:
> > Wouldn't setting the weight of terms in title back to normal (e.g. 5 to
> 1)
> > by below line, automatically adjust the wdfs and field lengths?
> >
> > indexer.index_text(title, 5, "S"); ->  indexer.index_text(title, 1, "S");
> >
> > if it does not then we should include that part in the patch too. I like
> to
> > create a patch for xapian-letor for resolving common code of xapian.
>
> I'm not sure I follow.
>
> The reason we use 5 here is that the page title is that matching terms
> in the title are usually a good indicator of a page that should be
> ranked highly for a search (note omindex is not usually working in a
> domain where evil SEOs are trying to distort the rankings).
>
> If we simply change 5 to 1 here, then the title won't be given any extra
> consideration, which would be a regression in this area.
>
> Cheers,
>     Olly
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20140322/d8f6c762/attachment.html>


More information about the Xapian-devel mailing list