[Xapian-discuss] GSOC-2015 : Clustering of search results

James Aylett james-xapian at tartarus.org
Thu Jan 15 15:58:33 GMT 2015


Please do not repost the same email, even to different mailing lists. Someone will reply when they have time, and repeating the same information merely gives us more to work through before someone can get back to you.

J

On 15 Jan 2015, at 11:54, Richhiey Thomas <richhiey.thomas at gmail.com> wrote:

> Looking at the existing approaches, I suppose we have approached
> clustering with the single link heirarchial clustering and k means, which
> appear to be slow for moderately sized datasets.
> 
> I would like to propose a density based clustering technique for xapian
> based on DBSCAN or OPTICS since these approaches can handle clusters of
> various shapes and sizes and are also resistant to noise.
> Below are links for papers on the same:
> http://www.dbs.ifi.lmu.de/Publikationen/Papers/KDD-96.final.frame.pdf
> http://fogo.dbs.ifi.lmu.de/Publikationen/Papers/OPTICS.pdf
> 
> With use of good indexing structures, the complexity of the above
> algorithms is O(nlogn) which is faster and efficient than single link and k
> means.
> 
> Could I know whether this would be a good idea for a project? And if not,
> how else can I approach this project?
> _______________________________________________
> Xapian-discuss mailing list
> Xapian-discuss at lists.xapian.org
> http://lists.xapian.org/mailman/listinfo/xapian-discuss

-- 
 James Aylett, occasional trouble-maker
 xapian.org




More information about the Xapian-discuss mailing list