[Xapian-devel] GSoC 2014: Clustering of Search Results

Abhishek Gupta a.gupps at gmail.com
Thu Mar 20 20:07:31 GMT 2014


Sir,

I am Abhishek Gupta. I know I am quite late for the project discussion
because I came to know about GSoC a bit lately but still I would like to
discuss this project which interests me a lot. I know I have to submit some
code so as to show my skill set but as the deadline is quite near I will
submit the patches or exercises after the deadline to strengthen my
application and show my coding skill.
I read your existing source code for the clustering which is quite slow
because of the hierarchical based clustering which is not required at all.*You
have already provided with the number of clusters you should have in the
end*. So for this we can employ K-means algorithm which can perform far
better than the current algorithm.

1) Hierarchical clustering have high memory requirements *O(n*n)* in
comparison to *O(n+K)* space complexity of K-means algorithm, where*n* is
the number of elements and* K *is the number of clusters.
2) Hierarchical clustering running time is *O(n*n*n)* on the other hand
K-means algorithm has time complexity of *O(n)*.
3) K-means improves the clustering iteratively, more you run the code more
better you will get the results.

One thing that K-means lacks is its non-deterministic outcome. Every time
it will produce different clusters. But we can always run the algorithm
10-12 times and then take the average even then it will perform far better
than the hierarchical one.

So I would like to propose this algorithm which can perform better than the
hierarchical one. After that to improve the clustering more we can also
implement K-medoids/K-means++ clustering methods.

I would you give some reviews regarding the proposal, so that I can submit
the proposal at time.

Thanks and Regards
Abhishek Gupta
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20140321/7bda8301/attachment.html>


More information about the Xapian-devel mailing list