[Xapian-devel] GSoC 2014
nikharagrawal2006 at gmail.com
Fri Feb 14 16:49:39 GMT 2014
I am Nikhar Agrawal, currently studying in my third year at IIIT-H,
pursuing Computer Science and Engineering. I am fairly proficient in C++. I
have been a GSoC 2013 participant for Boost C++ libraries and managed to
successfully merge my project into Boost trunk.
As a part of my course on Information Retrieval and Extraction, I did a
project on searching for queries on the latest 40 gb wikipedia dump. Hence,
I got pretty excited to see all the projects on Xapian ideas page that I
could identify with.
To summarize, in the project, I used libxml++ to parse the wiki dump. I
built an index of words (using multi-way merge) along with its posting list
in the decreasing order of TF-IDF. And then built a secondary index on top
of it for fast retrieval. To search for multiword queries, I used a simple Σ
tf-idf ranking system.
I would like to apply for GSoC 2014 as well and Xapian seems a great place
to learn more and put in practice the theories I am learning in my
Information Retrieval and Extraction course.
How would you suggest I should proceed?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Xapian-devel