clustering technique using lsi

Olly Betts olly at survex.com
Tue Mar 22 18:51:52 GMT 2016


On Tue, Mar 22, 2016 at 02:08:23PM +0530, MURTUZA BOHRA wrote:
> How Latent semantic indexing would help?
> 
> In LSI we project query (considering as a pseudo document) on to the
> term-document vector space and based on some threshold we find the relevant
> documents. Very similarly if we use LSI for clustering, and instead of
> query if we take one of our search result and set different thresholds and
> based on each threshold we can cluster the search result at single shot.

So if I follow, you take one document (how do you decide which) and then
generate a set of clusters as (multi-dimensional) rings around it of
increasing radius?

That doesn't sound like it's going to do a good job of producing useful
clusters.  The group around the "seed" document is probably related,
but once you get beyond that the documents in the cluster are defined
only by distance from the seed.

In geographical terms, locations which are < 10km from a given point
might be a useful cluster, but locations between 10 and 20km from that
point is much less likely to be.

Cheers,
    Olly



More information about the Xapian-devel mailing list