<div dir="ltr">Hi Olly,<div><br></div><div>Thanks for an early reply. I looked a bit deep into the tf-idf implementation and found that the following document length normalizations are not implemented [1].</div><div><br></div><div>1) Cosine normalization </div><div>2)Sum of weights normalization</div><div>3) Fourth Normalization </div><div>4) Max weight normalization</div><div><br></div><div>All the normalization factor being a constant at the document level, for each combination of wdf and idf weighting scheme (that are already implemented)  the above document normalization factors should be stored in the backend(index). </div><div><br></div><div>Furthermore, I was thinking  while weighting each term multiplying the document  normalization factor can be redundant, so can we have a abstract function like get_mulextra in Weight class which would return a term independent document normalization factor which can be multiplied to the weight of the document for the query to get the final weight(rank) of the document for a particular query. </div><div><br></div><div>Please suggest am I thinking in the correct direction.</div><div><br></div><div>References:</div><div>Nicola Polettini. The Vector Space model in Information Retrieval - Term Weighting Problem. January 2004.</div><div><br></div><div>Regards,</div><div><div style="font-size:13px">Prachi Prakash</div><div style="font-size:13px">Final year Graduate Student</div><div style="font-size:13px">LinkedIn: <a href="https://www.linkedin.com/in/prachi-prakash-7b674351/" target="_blank">https://www.<wbr>linkedin.com/in/prachi-<wbr>prakash-7b674351/</a></div><div style="font-size:13px">github: <a href="https://github.com/PrachiPrakash?tab=activity" target="_blank">https://github.com/<wbr>PrachiPrakash?tab=activity</a></div></div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Sun, Mar 5, 2017 at 8:41 PM, prachi prakash <span dir="ltr"><<a href="mailto:prachiprakash80@gmail.com" target="_blank">prachiprakash80@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Hello Everyone,<div><br></div><div>I am a second year graduate student at IIIT-Bangalore and my interest is in the field of Information Retrieval. I have successfully compiled Xapian from source  and have implemented some examples. While going through the project list Weighting Schemes project is the one I was looking to contribute to. So i went through the xapian-core/weight where most of the schemes are already present and I also went through the Bigram-model which was outside the tree and not merged yet.</div><div><br></div><div>So can Anyone of please give a pointer to which weighting schemes are not implemented yet so that I can start looking at it.</div><div><br></div><div>Regards,</div><div>Prachi Prakash</div><div>Final year Graduate Student</div><div>LinkedIn: <a href="https://www.linkedin.com/in/prachi-prakash-7b674351/" target="_blank">https://www.<wbr>linkedin.com/in/prachi-<wbr>prakash-7b674351/</a></div><div>github: <a href="https://github.com/PrachiPrakash?tab=activity" target="_blank">https://github.com/<wbr>PrachiPrakash?tab=activity</a></div></div>

</blockquote></div><br></div>