[Xapian-devel] [GSOC 2014] Indexing INEX dataset

Parth Gupta pargup8 at gmail.com
Sat Mar 22 18:27:55 GMT 2014


Yes James, is there any automatic way to know what weight was used for
titles or more generally for terms with some prefix?



On Sat, Mar 22, 2014 at 1:35 PM, James Aylett <james-xapian at tartarus.org>wrote:

> On 22 Mar 2014, at 08:22, Parth Gupta <pargup8 at gmail.com> wrote:
>
> > For unsupervised approaches like BM25 this approach works well but letor
> does not need special weighting for title in this form as it itself assigns
> weights to title features separately.
> >
> > But I see your concern it would be a problem when BM25 is used on the
> index with this setup. Hence its preferable to take a note of this uplift
> in title weight for xapian-letor and normalize it everywhere calculating
> the statistics.
>
> This would need configuring, though, wouldn't it? Not everyone (and I'm
> thinking of people who don't index using omindex here) applies a wdf of 5
> while indexing titles; they may apply a different non-1 number, or just
> leave it at 1 (and possibly apply weighting at search time).
>
> J
>
> --
>  James Aylett, occasional trouble-maker
>  xapian.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20140322/b81e7f8a/attachment.html>


More information about the Xapian-devel mailing list