clustering technique using lsi

James Aylett james-xapian at tartarus.org
Tue Mar 22 13:00:25 GMT 2016


On Tue, Mar 22, 2016 at 02:08:23PM +0530, MURTUZA BOHRA wrote:
> I am still trying to find some faster clustering technique for search
> result. One technique which strike to me is, using the Latent Semantic
> Indexing for Clustering the search result can give better results. In which
> we don't even need to iterate over different values of 'k'(in K-means
> algorithm) to cluster documents rather we can cluster whole search result
> in one go.
> 
> I am not sure this technique would be 100% helpful, that's why I
> need to first test this algorithm, please help me to figure this
> out.

Are you suggesting writing an implementation quickly and seeing what
happens with some real queries? Because that does sound like a good
way of deciding whether the algorithm is useful in our case, but you
don't have long -- about three days -- until your proposal need to be
in, so I don't know if you have enough time to do that and write
everything up.

Alternatively, if you have an algorithm you are confident will be fast
enough and provide some useful clustering, then you could implement
that, and as a stretch goal in the project extend the system to allow
other algorithms to be used, and look then at implementing LSI. (If
you've delivered something earlier in the project, it'd be fine to do
something a bit more speculative later.)

J

-- 
  James Aylett, occasional trouble-maker
  xapian.org



More information about the Xapian-devel mailing list