<div dir="ltr"><div>Hello everyone. <br>I had sent this mail quite a while ago and still am awaiting a reply. Thanks :)</div><div><br></div><div><div dir="ltr" style="font-size:13px">Looking at the existing approaches, I suppose we have approached clustering with the single link heirarchial clustering and k means, which appear to be slow for moderately sized datasets.<br></div><div dir="ltr" style="font-size:13px"><div><br></div><div> I would like to propose a density based clustering technique for xapian based on DBSCAN or OPTICS since these approaches can handle clusters of various shapes and sizes and are also resistant to noise.<br></div><div>Below are links for papers on the same:</div><div><a href="http://www.dbs.ifi.lmu.de/Publikationen/Papers/KDD-96.final.frame.pdf" target="_blank">http://www.dbs.ifi.lmu.de/Publikationen/Papers/KDD-96.final.frame.pdf</a><br></div><div><a href="http://fogo.dbs.ifi.lmu.de/Publikationen/Papers/OPTICS.pdf" target="_blank">http://fogo.dbs.ifi.lmu.de/Publikationen/Papers/OPTICS.pdf</a><br></div><div><br></div><div>With use of good indexing structures, the complexity of the above algorithms is O(nlogn) which is faster and efficient than single link and k means.</div><div><br></div><div>Could I know whether this would be a good idea for a project? And if not, how else can I approach this project?</div></div></div></div>