[Xapian-devel] [GSOC 2014] Indexing INEX dataset

Parth Gupta pargup8 at gmail.com
Mon Mar 17 20:07:29 GMT 2014


Hi Olly,

Wouldn't setting the weight of terms in title back to normal (e.g. 5 to 1)
by below line, automatically adjust the wdfs and field lengths?

indexer.index_text(title, 5, "S"); ->  indexer.index_text(title, 1, "S");

if it does not then we should include that part in the patch too. I like to
create a patch for xapian-letor for resolving common code of xapian.

Cheers,
Parth.






On Wed, Mar 12, 2014 at 3:13 AM, Jiarong Wei <vcamx3 at gmail.com> wrote:

> Thank you Parth and Olly! I'll try it :)
>
> Jiarong Wei
>
> On Mar 11, 2014, at 16:57, Olly Betts <olly at survex.com> wrote:
>
> > On Tue, Mar 11, 2014 at 03:20:31PM +0100, Parth Gupta wrote:
> >>>
> >>> On current trunk, we index the title with prefix "S" by default in
> >>> omindex, though with a wdf inc of 5 rather than 1:
> >>>
> >>>            indexer.index_text(title, 5, "S");
> >>>
> >>> So I don't think you need that change to omindex now.
> >>
> >> Yes, but please make sure to change 5 to 1 otherwise divide the final
> count
> >> statistics by 5 . :)
> >
> > We really need to resolve any instances where letor requires code in
> > other parts of Xapian to be patched.
> >
> > In this case, possibly the bias on the title should be done differently,
> > but won't this just mean both the wdfs and the field length for the S
> > prefix are 5 times larger, and it won't matter?
> >
> > Cheers,
> >    Olly
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20140317/c083d2cc/attachment.html>


More information about the Xapian-devel mailing list