<div dir="ltr">

<p style="margin-bottom:0.35cm;line-height:115%">Hi Developers,</p><p style="margin-bottom:0.35cm;line-height:115%"><span style="line-height:115%">I am Ganesh Prabu pursuing my final year in computer science from

SASTRA University, India. </span><span style="line-height:14.9499998092651px">I read through the project ideas page and i found Clustering of Search Results to be the one that aptly fits my profile. Before proceeding further</span><span style="line-height:115%"> I will introduce myself a little and my

programming background,</span></p>

<p style="margin-bottom:0.35cm;line-height:115%">About : 

</p>

<p style="margin-bottom:0.35cm;line-height:115%">I have excellent

algorithmic skills and good grasp on Object Oriented Design Patterns.

I did my internship at KLA-Tencor where I worked on projects

involving multithreading in C# and CPP. So I have about five months of

industrial experience. I have experience coding Data mining algorithms as part of my academics. I have worked in CUDA for generating Mandlebrot and Julia Sets. I am good at benchmarking and always like to find ways to improve the method.</p><p style="margin-bottom:0.35cm;line-height:115%">Besides i have done several projects, some of

them include Chain reaction game (JavaScript), AI Snake. I won first place in

Microsoft conducted, intra college competition, RaspberryPi kits from

KLA-Tencor for developing an OMR reader. Besides I participate in

Codechef and Hackerrank to shape my algorithmic skills. Here is my

Linkedin and Github account</p><p style="margin-bottom:0.35cm;line-height:115%"><span style="line-height:115%"><a href="https://www.linkedin.com/in/ganeshpraburavi">https://www.linkedin.com/in/ganeshpraburavi</a></span></p>

<p style="margin-bottom:0.35cm;line-height:115%"><a href="https://github.com/ganeshpraburavi">https://github.com/ganeshpraburavi</a></p>

<p style="margin-bottom:0.35cm;line-height:115%">I started reading through the existing code and they have

implemented K-Means algo with TF-IDF as the similarity measure.</p>

<p style="margin-bottom:0.35cm;line-height:115%">Problems in

Existing Method :</p>

<p style="margin-bottom:0.35cm;line-height:115%">       1. They are not

doing any dimensionality reduction.(Large features)</p><p style="margin-bottom:0.35cm;line-height:115%">2. No effort in feature selection. Even if it ran successfully, it would have resulted       in poor clusters</p>

<p style="margin-bottom:0.35cm;line-height:115%">Solution</p>

<p style="margin-bottom:0.35cm;line-height:115%">       1. Do

Dimensionality Reduction(DRT) in such a way that it reduces the          

        features and also select the most relevant features. [1]</p>

<p style="margin-bottom:0.35cm;line-height:115%">       2. Implement a

parallel clustering algorithm like Buckshot or Suffix tree         

clustering or Lingo. These clustering algos are more suitable for Web documents  [2]</p><p style="margin-bottom:0.35cm;line-height:115%">*Note: Lingo is an algorithm employed in Carrot2 for clustering of search results from Lucene, Solr<br><span style="line-height:115%"><br></span></p><p style="margin-bottom:0.35cm;line-height:115%"><span style="line-height:115%">I am yet to

prepare to exact method for solving this problem. Is the idea of parallel programming paradigm is okay? I would love to have discussion on how it could

be proceeded further.</span><br></p>

<p style="margin-bottom:0.35cm;line-height:115%">I am very excited

about this project and would be very glad to work on this with my

fullest dedication and accomplish each task specified, before the

fixed deadline.</p>

<p style="margin-bottom:0.35cm;line-height:115%"><span style="line-height:115%"><br></span></p><p style="margin-bottom:0.35cm;line-height:115%"><span style="line-height:115%">[1] <a href="https://web.cs.dal.ca/~luo/AI2005.pdf">https://web.cs.dal.ca/~luo/AI2005.pdf</a></span><br></p><p style="margin-bottom:0.35cm;line-height:115%"><span style="line-height:14.9499998092651px">[2] <a href="http://project.carrot2.org/publications/wroblewski-2003-ahc.pdf">http://project.carrot2.org/publications/wroblewski-2003-ahc.pdf</a></span></p><p style="margin-bottom:0.35cm;line-height:115%"><span style="line-height:14.9499998092651px"> </span></p><p style="margin-bottom:0.35cm;line-height:115%"><span style="line-height:normal">-- </span></p><div class="gmail_signature">Thanks<br>Ganesh Prabu</div>

</div>