[Xapian-devel] GSoC, Xapian Project Weighting Schemes
azeem201001 at yahoo.in
Mon Apr 2 07:10:15 BST 2012
I am very sorry I did not include xapian-devel mailing list in my previous mail.
Thanks for responding my mail.
From: Olly Betts <olly at survex.com>
To: Mohd Azeem <azeem201001 at yahoo.in>
Cc: Parth Gupta <parthg.88 at gmail.com>
Sent: Saturday, 31 March 2012 11:40 AM
Subject: Re: GSoC, Xapian Project Weighting Schemes
Please DON'T mail individual mentors privately - use the xapian-devel
mailing list instead.
On Sat, Mar 31, 2012 at 01:35:16PM +0800, Mohd Azeem wrote:
> Presently Xapian
> provides the ability to rank search result by the mathematical
> formulas like tf*idf andBM25.
Actually, you can already rank results by incoming hyperlink counts, or
any query-independent factor(s) you want to keep track of, and you can
combine that with term-based weights. This is done by creating a
PostingSource subclass and using it to the query:
> weight S= S1(Weight calculated by BM25) * S2(weight of document
> calculated based on
You can't multiply the factors like this with a PostingSource, only add
them - is there any theoretical or experimental basis for multiplying
the weight contributions in this situation?
So your suggested project would involve counting up in-bound hyperlinks,
and writing a simple PostingSource class to use them, plus perhaps
adding a new query operator which multiplies weights. Unfortunately
that doesn't seem like it would be nearly enough work for a GSoC
Thanks for the suggestion though.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Xapian-devel