GSoC 2016 - Introduction

Sun May 1 17:23:58 BST 2016

Before going ahead with the tests as you mentioned above, I would just like
to clarify a few higher level things that I am still in doubt about.

1) As discussed during the IRC interview, I was suggested about first
implementing a normal K-means clustering implementation and then adding on
the PSO module as a functionality that can be used to improve quality of
clustering for speed as a trade off. This is the way I should see the
project, right?

2) Isn't it easier to first think about the API for the clustering
functionality rather then deriving it through test cases? (I haven't been
used to thinking like this so it gets kind of hard to think in reverse). Do
correct me if writing tests before is the better way.

3) The fitness measure I plan to use for the PSO part and also for
evaluating the clustering results is ADDC (average distance of documents to
the cluster centroid). Is this the best fit?

4) For parameters in K-means and PSO, default values can be set which can
be overridden in a special use case?

5) There is already a clustering branch that was created before. Do I have
to continue work with the existing implementation or do I start afresh?

Currently I'm looking at the previous clustering branch and the test API
and getting used to the things I am not familiar with in the codebase. Once
I am confident, I'll go ahead with a simple test for the clustering as you
suggested.

Thanks
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20160501/c1a4725d/attachment.html>