Questions about Weighting Schemes project

Fri Apr 5 07:01:07 BST 2019

On Thu, Apr 04, 2019 at 02:42:14PM +0530, Sourav Saha wrote:
> I was going through the Xapian code base of different weighting schemes. In
> the lmweight code, I found out that we are returning non-negative numbers
> from get_maxpart, get_sumpart methods. Is this to avoid negative weight?

Yes - Xapian requires each term contributes a non-negative weight.

> Also in the Language Model with Jelinek Mercer Smoothing (LM-JM)
> implementation, I don't see any idf effect involved in that equation. The
> LM-JM equation looks something like this:
>  *(LAMBDA)* MLE(t,d) + (1-LAMBDA) * MLE(t,c)*
> However, if we bind it with idf, it will look like :
> 
> *1 + ((LAMBDA) / (1-LAMBDA) * (MLE(t,d) / MLE(t,c))) *
> which is widely used everywhere. I am planning to patch an improved
> representation of LM-JM with the idf effect shortly. Kindly let me know for
> any concerns.

Interesting.  I wonder if our JM implementation is just wrong, or if
there are older and newer variants or something.

Do you have a reference handy?

Cheers,
    Olly