[Xapian-devel] GSoc Project Idea Weighting Schemes (Ranking)

Olly Betts olly at survex.com
Sun Nov 23 21:50:39 GMT 2014


On Sun, Nov 23, 2014 at 01:59:53PM +0530, Abhishek Singh Kushwah wrote:
> The new weighing schemes or improvement in implementing the previous models
> to change the default scheme of BM25 from SMART with reference to this
> paper www.aclweb.org/anthology/P10-1141

I don't any motivation there for changing the default - a quote from
that paper actually explicitly notes that BM25 is much more successful
in general, while performing similarly in this particularly case:

    Of interest is the fact that although the BM25 tf algorithm has
    proved much more successful in IR, the same doesn’t apply in this
    setting and its accuracy is similar to the simpler augmented tf
    approach.

> After skimming through the schemes implemented in Xapian::weight. There
> seems a considerable hope in editing the algorithms to increase efficiency
> and speed and implementing new ones in use.

Where do you think speed and efficiency can be improved?

> I would need mentors point of view regarding new schemes for the project
> wrt SMART and others.

Schemes need to be possible to sanely implement within Xapian's
weighting framework.  Needing to track more statistics is probably
OK though (e.g. LM required adding support for getting the number of
unique terms in each document).

Schemes which have been evaluated and shown to be promising (even if in
a restricted domain) are more interesting.

We aren't looking for students to develop their own weighting scheme
from scratch as part of a GSoC project (someone proposed this in a
previous GSoC).

Were you the "abhishek" asking recently on IRC about installing on
mingw?  If so and you didn't already resolve that, showing us the error
would probably enable us to help.

Cheers,
    Olly



More information about the Xapian-devel mailing list