<div dir="ltr"><div><div><div>Hello,<br><br></div>Now work on stopword removal and stemming is almost ending and the run time for KMeans seem to be getting lesser (around 0.15 s for 100 documents and this increases to around 1.2 s with 500 documents and 2.5 s with 1000 documents). I tried this out on the BBC datasets available with a value k=5, since there were 5 categories in the dataset.<br><br></div>Going forward, the next step to optimize KMeans is to use the faster optimized version of KMeans which reduces distance computations developed by Charles Elkan. For this, I will be providing the user an option to specify with the constructor whether they would want the standard algorithm or Elkans algorithm. and write a method within KMeans to implement the triangle inequality optmization. I will also be moving RoundRobin to the testsuite.<br><br></div>Thanks.<br></div>