<div dir="ltr"><span style="font-family:arial,sans-serif;font-size:13px">Sir,</span><div style="font-family:arial,sans-serif;font-size:13px"><br></div><div style="font-family:arial,sans-serif;font-size:13px">I am Abhishek Gupta. I know I am quite late for the project discussion because I came to know about GSoC a bit lately but still I would like to discuss this project which interests me a lot. I know I have to submit some code so as to show my skill set but as the deadline is quite near I will submit the patches or exercises after the deadline to strengthen my application and show my coding skill.</div>
<div style="font-family:arial,sans-serif;font-size:13px">I read your existing source code for the clustering which is quite slow because of the hierarchical based clustering which is not required at all.<b>You have already provided with the number of clusters you should have in the end</b>. So for this we can employ K-means algorithm which can perform far better than the current algorithm.</div>
<div style="font-family:arial,sans-serif;font-size:13px"><br></div><div style="font-family:arial,sans-serif;font-size:13px">1) Hierarchical clustering have high memory requirements <font face="courier new, monospace"><b>O(n*n)</b></font> in comparison to <font face="courier new, monospace"><b>O(n+K)</b></font> space complexity of K-means algorithm, where<font face="courier new, monospace"><b>n</b></font> is the number of elements <font face="arial, helvetica, sans-serif">and</font><b style="font-family:'courier new',monospace"> K </b><font face="arial, helvetica, sans-serif">is the number of clusters.</font></div>
<div style="font-family:arial,sans-serif;font-size:13px"><font face="arial, helvetica, sans-serif">2) Hierarchical clustering running time is </font><font face="courier new, monospace"><b>O(n*n*n)</b></font><font face="arial, helvetica, sans-serif"> on the other hand K-means algorithm has time complexity of </font><b style="font-family:'courier new',monospace">O(n)</b><font face="arial, helvetica, sans-serif">.</font></div>
<div style="font-family:arial,sans-serif;font-size:13px"><font face="arial, helvetica, sans-serif">3) K-means improves the clustering iteratively, more you run the code more better you will get the results.</font></div><div style="font-family:arial,sans-serif;font-size:13px">
<font face="arial, helvetica, sans-serif"><br></font></div><div style="font-family:arial,sans-serif;font-size:13px"><font face="arial, helvetica, sans-serif">One thing that K-means lacks is its non-deterministic outcome. Every time it will produce different clusters. But we can always run the algorithm 10-12 times and then take the average even then it will perform far better than the hierarchical one.</font></div>
<div style="font-family:arial,sans-serif;font-size:13px"><font face="arial, helvetica, sans-serif"><br></font></div><div style="font-family:arial,sans-serif;font-size:13px"><font face="arial, helvetica, sans-serif">So I would like to propose this algorithm which can perform better than the hierarchical one. After that to improve the clustering more we can also implement K-medoids/K-means++ clustering methods.</font></div>
<div style="font-family:arial,sans-serif;font-size:13px"><font face="arial, helvetica, sans-serif"><br></font></div><div style="font-family:arial,sans-serif;font-size:13px"><font face="arial, helvetica, sans-serif">I would you give some reviews regarding the proposal, so that I can submit the proposal at time.</font></div>
<div style="font-family:arial,sans-serif;font-size:13px"><font face="arial, helvetica, sans-serif"><br></font></div><div style="font-family:arial,sans-serif;font-size:13px"><font face="arial, helvetica, sans-serif">Thanks and Regards</font></div>
<div style="font-family:arial,sans-serif;font-size:13px"><font face="arial, helvetica, sans-serif">Abhishek Gupta</font></div></div>