clustering technique using lsi

MURTUZA BOHRA murtuzabohra88 at gmail.com
Wed Mar 23 11:15:55 GMT 2016


I think should explain the proposed algorithm in the proposal more clearly.
I did not do that because I thought it would make the proposal lengthy. Is
there a word limit for the proposal??

On Wed, Mar 23, 2016 at 4:40 PM, MURTUZA BOHRA <murtuzabohra88 at gmail.com>
wrote:

> Hello sir,
>
> You have interpreted correctly that clustering will be done by generating
> the ring around the Document(i.e. the basic idea of LSI). But it is not
> like increasing the radius and the next shell will be another cluster,
> Rather it would pick one document (based on relevance score) and form a
> ring around it to cluster the document, then from the remaining
> documents(not in the cluster but are there in the search result) again
> another document will be picked and next cluster will be formed, this will
> go on till all the search results are exhausted.
>
> I have attached a file to geometrically illustrate the algorithm, please
> have a look at it.
>
> On Wed, Mar 23, 2016 at 12:21 AM, Olly Betts <olly at survex.com> wrote:
>
>> On Tue, Mar 22, 2016 at 02:08:23PM +0530, MURTUZA BOHRA wrote:
>> > How Latent semantic indexing would help?
>> >
>> > In LSI we project query (considering as a pseudo document) on to the
>> > term-document vector space and based on some threshold we find the
>> relevant
>> > documents. Very similarly if we use LSI for clustering, and instead of
>> > query if we take one of our search result and set different thresholds
>> and
>> > based on each threshold we can cluster the search result at single shot.
>>
>> So if I follow, you take one document (how do you decide which) and then
>> generate a set of clusters as (multi-dimensional) rings around it of
>> increasing radius?
>>
>> That doesn't sound like it's going to do a good job of producing useful
>> clusters.  The group around the "seed" document is probably related,
>> but once you get beyond that the documents in the cluster are defined
>> only by distance from the seed.
>>
>> In geographical terms, locations which are < 10km from a given point
>> might be a useful cluster, but locations between 10 and 20km from that
>> point is much less likely to be.
>>
>> Cheers,
>>     Olly
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20160323/836397f9/attachment.html>


More information about the Xapian-devel mailing list