GSoC: Weighting Schemes

Vivek Pal vivekpal.dtu at gmail.com
Sun May 8 12:06:16 BST 2016


Hi James,

Thanks for clearing doubts I had earlier.

>>if we can introduce the variants using optional parameters that default to
>>(effectively) 'off' that might be better than distinct ones,

Yes, this will definitely be the better approach for introducing the
variants of existing weighting functions.
Thanks for the suggestion.
Next, I will try to come up with a draft of pseudo-code for each of those
variants in next few days. Would be helpful
if you could review them before coding period begins. It will help me get a
clear picture of implementation in advance.

>>you need to independently calculate, or independently
>>verify, the correct outputs for some test sets (you should be able to
>>use the existing test databases).

So, careful manual testing of implemented code and automated testing
through xapian-core/tests/api_weight.cc
using the existing test databases is what I'd need to perform for complete
testing of implemented weighting functions.
Please correct me if I am wrong or missing something here.

>>You should talk to Guarav about that, in particular looking at the evaluation
work he did previously
>>(https://github.com/samuelharden/xapian-evaluation)

I've started exploring and trying to get this evaluation module running on
my system.
Facing some issues initially so trying to sort out those issues with the
help from Gaurav on IRC.

>>We may want to take the opportunity to discuss whether parts or all of
>>this evaluation framework can be moved into the main Xapian repo, and
>>if there are changes that will make it easier to use for evaluation in
future.

Yes, it'd be a huge plus for us as it would help to compare
Xapian's performance based on the different weighting functions.
I'll add this under "Additional tasks" in my project wiki and would like to
work with Gaurav after completing my GSoC project.

>>If Nishad doesn't find time to take this forward,
>>it should be fine for you to pick up and complete this normalisation.

Sure, I'll do it as a part of Additional tasks after GSoC period :)

>>Yes, that's a good idea. You might want, at the end of the project, to
>>transfer any remaining ideas and thoughts either into the bug tracker
>>or to somewhere on the wiki

I've got 3 ideas for this section so far after all discussions:-
1. Implement remaining SMART normalizations of tf-idf weighting function ,
2. Work with Gaurav to get parts of evaluation module in main repo to start
with.

>>Good luck with them!

Thanks :)

Regards,
Vivek
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20160508/a9641133/attachment.html>


More information about the Xapian-devel mailing list