[Xapian-devel] [GSOC 2014] Indexing INEX dataset

Olly Betts olly at survex.com
Thu Mar 20 01:35:20 GMT 2014


On Mon, Mar 17, 2014 at 09:07:29PM +0100, Parth Gupta wrote:
> Wouldn't setting the weight of terms in title back to normal (e.g. 5 to 1)
> by below line, automatically adjust the wdfs and field lengths?
> 
> indexer.index_text(title, 5, "S"); ->  indexer.index_text(title, 1, "S");
> 
> if it does not then we should include that part in the patch too. I like to
> create a patch for xapian-letor for resolving common code of xapian.

I'm not sure I follow.

The reason we use 5 here is that the page title is that matching terms
in the title are usually a good indicator of a page that should be
ranked highly for a search (note omindex is not usually working in a
domain where evil SEOs are trying to distort the rankings).

If we simply change 5 to 1 here, then the title won't be given any extra
consideration, which would be a regression in this area.

Cheers,
    Olly



More information about the Xapian-devel mailing list