[Xapian-devel] GSoC, Xapian Project Weighting Schemes

Mohd Azeem azeem201001 at yahoo.in
Mon Apr 2 07:10:15 BST 2012


Hello all,
I am very sorry I did not include xapian-devel mailing list in my previous mail.
Thanks for responding my mail.

Mohd Azeem
NIT UK

________________________________
 From: Olly Betts <olly at survex.com>
To: Mohd Azeem <azeem201001 at yahoo.in> 
Cc: Parth Gupta <parthg.88 at gmail.com> 
Sent: Saturday, 31 March 2012 11:40 AM
Subject: Re: GSoC, Xapian Project Weighting Schemes
 
Please DON'T mail individual mentors privately - use the xapian-devel
mailing list instead.

On Sat, Mar 31, 2012 at 01:35:16PM +0800, Mohd Azeem wrote:
> Presently Xapian
> provides the ability to rank search result by the mathematical
> formulas like tf*idf andBM25.

Actually, you can already rank results by incoming hyperlink counts, or
any query-independent factor(s) you want to keep track of, and you can
combine that with term-based weights.  This is done by creating a
PostingSource subclass and using it to the query:

http://xapian.org/docs/postingsource.html

> weight S= S1(Weight calculated by BM25) * S2(weight of document
> calculated based on

You can't multiply the factors like this with a PostingSource, only add
them - is there any theoretical or experimental basis for multiplying
the weight contributions in this situation?

So your suggested project would involve counting up in-bound hyperlinks,
and writing a simple PostingSource class to use them, plus perhaps
adding a new query operator which multiplies weights.  Unfortunately
that doesn't seem like it would be nearly enough work for a GSoC
project.

Thanks for the suggestion though.

Cheers,
    Olly
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20120402/b8e6bc16/attachment.htm>


More information about the Xapian-devel mailing list