Weighting Scheme Project - Doubts

Vivek Pal vivekpal.dtu at gmail.com
Sat Apr 30 12:29:17 BST 2016


Hello,

I've edited my project wiki page. I've added project description as of now.
You may please have a look at it. Before I go on to add Project Plan, I
need to discuss a few things and clear related doubts. Regarding my
project, I have a few doubts that I'd need some help with:

1. I shall be improving existing weighting schemes which can be done in two
ways, it seems. First, discard and modify the existing functions and second
is that we retain the existing functions and provide modified functions
alongside as added functionalities to the users. Both methods would involve
making changes in the existing weighting function source code in
xapian-core/weight.

2. I am not sure about the whole testing process for the weighting schemes.
Please brief me on that one or direct to some useful links. While adding
support for freq & squared normalization during the application period, I
basically added new test cases for these two normalizations but it would be
helpful if I can understand the whole process adopted in Xapian for testing
purposes. I've found this page helpful so far :
https://xapian.org/docs/tests

3. Performance evaluation of weighting schemes : I'm thinking of using TREC
dataset collection and calculate Precision or recall and MAP. Would that be
the right way to go? This will be done in the second half of the coding
period after the implementation (with testing) part is completely done.

4. Implementation of remaining SMART normalization of tf-idf weighting
scheme :- Earlier on IRC, Olly had rightly pointed out that it would be
tricky and would require making some changes to how matcher handles certain
things in Xapian. Along with that, this pull request
https://github.com/xapian/xapian/pull/81 has some work done on "max"
normalization. I won't mind working on it if there are no issues with that.
I'm thinking of putting these as something like "Additional/optional tasks"
in my project wiki page. I'm inclined to working on it after I have
finished the proposed project first.

Also, I'll be having semester exams starting from the next Monday(May 09,
16). So, it' is likely that I'll pause project work and continue thereafter
as soon as exams get over.

Thanks,
Vivek
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20160430/ab88947a/attachment.html>


More information about the Xapian-devel mailing list