[Xapian-devel] GSOC-2015 : Clustering of search results

Richhiey Thomas richhiey.thomas at gmail.com
Wed Jan 21 12:12:04 GMT 2015


Hello everyone.
I had sent this mail quite a while ago and still am awaiting a reply.
Thanks :)

Looking at the existing approaches, I suppose we have approached clustering
with the single link heirarchial clustering and k means, which appear to be
slow for moderately sized datasets.

 I would like to propose a density based clustering technique for xapian
based on DBSCAN or OPTICS since these approaches can handle clusters of
various shapes and sizes and are also resistant to noise.
Below are links for papers on the same:
http://www.dbs.ifi.lmu.de/Publikationen/Papers/KDD-96.final.frame.pdf
http://fogo.dbs.ifi.lmu.de/Publikationen/Papers/OPTICS.pdf

With use of good indexing structures, the complexity of the above
algorithms is O(nlogn) which is faster and efficient than single link and k
means.

Could I know whether this would be a good idea for a project? And if not,
how else can I approach this project?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20150121/3e1372ac/attachment.html>


More information about the Xapian-devel mailing list