GSoC 2017

Richhiey Thomas richhiey.thomas at gmail.com
Mon Jan 23 12:23:40 GMT 2017


Hello devs,

My name is Richhiey Thomas and I'm studying Computer Engineering under
Mumbai University. I had worked with Xapian in GSoC 2016 where I had worked
on Clustering of Search Results. I would want to continue working on the
project and was wondering whether it would fit the scope of GSoC.

The clustering branch had a clustering API and KMeans clusterer implemented
but hasnt been merged yet because it had to be optimized further and due to
other smaller issues. I would like to complete work on merging this
clustering branch and implementing a hierarchial clusterer.

Also, a main reason for the performance reduction with large document
corpus was because of the dimensionality of the document vectors. Therefore
a latent semantic analysis to reduce document vectors size is something
that could be necessary.

I would like to have your feedback on the same.

Thanks :)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20170123/00e3c718/attachment.html>


More information about the Xapian-devel mailing list