[Xapian-devel] GSoC 2014: Clustering of Search Results
Abhishek Gupta
a.gupps at gmail.com
Thu Mar 20 20:07:31 GMT 2014
Sir,
I am Abhishek Gupta. I know I am quite late for the project discussion
because I came to know about GSoC a bit lately but still I would like to
discuss this project which interests me a lot. I know I have to submit some
code so as to show my skill set but as the deadline is quite near I will
submit the patches or exercises after the deadline to strengthen my
application and show my coding skill.
I read your existing source code for the clustering which is quite slow
because of the hierarchical based clustering which is not required at all.*You
have already provided with the number of clusters you should have in the
end*. So for this we can employ K-means algorithm which can perform far
better than the current algorithm.
1) Hierarchical clustering have high memory requirements *O(n*n)* in
comparison to *O(n+K)* space complexity of K-means algorithm, where*n* is
the number of elements and* K *is the number of clusters.
2) Hierarchical clustering running time is *O(n*n*n)* on the other hand
K-means algorithm has time complexity of *O(n)*.
3) K-means improves the clustering iteratively, more you run the code more
better you will get the results.
One thing that K-means lacks is its non-deterministic outcome. Every time
it will produce different clusters. But we can always run the algorithm
10-12 times and then take the average even then it will perform far better
than the hierarchical one.
So I would like to propose this algorithm which can perform better than the
hierarchical one. After that to improve the clustering more we can also
implement K-medoids/K-means++ clustering methods.
I would you give some reviews regarding the proposal, so that I can submit
the proposal at time.
Thanks and Regards
Abhishek Gupta
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20140321/7bda8301/attachment.html>
More information about the Xapian-devel
mailing list