clustering technique using lsi

MURTUZA BOHRA murtuzabohra88 at gmail.com
Wed Mar 23 11:10:30 GMT 2016


Hello sir,

You have interpreted correctly that clustering will be done by generating
the ring around the Document(i.e. the basic idea of LSI). But it is not
like increasing the radius and the next shell will be another cluster,
Rather it would pick one document (based on relevance score) and form a
ring around it to cluster the document, then from the remaining
documents(not in the cluster but are there in the search result) again
another document will be picked and next cluster will be formed, this will
go on till all the search results are exhausted.

I have attached a file to geometrically illustrate the algorithm, please
have a look at it.

On Wed, Mar 23, 2016 at 12:21 AM, Olly Betts <olly at survex.com> wrote:

> On Tue, Mar 22, 2016 at 02:08:23PM +0530, MURTUZA BOHRA wrote:
> > How Latent semantic indexing would help?
> >
> > In LSI we project query (considering as a pseudo document) on to the
> > term-document vector space and based on some threshold we find the
> relevant
> > documents. Very similarly if we use LSI for clustering, and instead of
> > query if we take one of our search result and set different thresholds
> and
> > based on each threshold we can cluster the search result at single shot.
>
> So if I follow, you take one document (how do you decide which) and then
> generate a set of clusters as (multi-dimensional) rings around it of
> increasing radius?
>
> That doesn't sound like it's going to do a good job of producing useful
> clusters.  The group around the "seed" document is probably related,
> but once you get beyond that the documents in the cluster are defined
> only by distance from the seed.
>
> In geographical terms, locations which are < 10km from a given point
> might be a useful cluster, but locations between 10 and 20km from that
> point is much less likely to be.
>
> Cheers,
>     Olly
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20160323/2a9be78a/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: LSI_Clustering.jpg
Type: image/jpeg
Size: 1476831 bytes
Desc: not available
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20160323/2a9be78a/attachment-0001.jpg>


More information about the Xapian-devel mailing list