A query about clustering Idea

James Aylett james-xapian at tartarus.org
Sun Mar 20 13:51:11 GMT 2016

On Sun, Mar 20, 2016 at 01:02:03PM +0530, MURTUZA BOHRA wrote:

> I am interested in clustering of search results idea, I know clustering
> techniques from the perspective of Machine learning! And I am figuring out
> the way to implement machine learning technique for clustering search
> result. I had 5-month back a project on LSI(latent simantic indexing and
> raking the search result) through that if I am given the TF-IDF matrix then
> clustering can be done quickly and efficiently which solves  the problem
> with GSoc-2016 project.

Hi, Murtuza! It certainly sounds like you have the right background to
tackle this project.

> But I do not find any document on how it is implemented in GSoC-2010
> because that would help to understand issues which were not taken care
> previously. Please help me to access that document or part of the code
> where actual clustering algorithm is implemented.

There was no 2010 GSoC work; there was some in 2014, but the project
was unsuccessful and I believe got no further than an untested KMeans

The earlier implementation (which wasn't part of GSoC) is available in
the svn/clustering branch, as noted in the project description. Any
documentation is likely to be code comments; I don't believe there's
an internal architecture document for it. You can browse it on github
(https://github.com/xapian/xapian/tree/svn/clustering), but it's
probably easier just to clone it locally and look around.


  James Aylett, occasional trouble-maker

More information about the Xapian-devel mailing list