[Xapian-devel] GSoC 2014

Nikhar Agrawal nikharagrawal2006 at gmail.com
Fri Feb 14 16:49:39 GMT 2014


Hi,
I am Nikhar Agrawal, currently studying in my third year at IIIT-H,
pursuing Computer Science and Engineering. I am fairly proficient in C++. I
have been a GSoC 2013 participant for Boost C++ libraries and managed to
successfully merge my project into Boost trunk.

As a part of my course on Information Retrieval and Extraction, I did a
project on searching for queries on the latest 40 gb wikipedia dump. Hence,
I got pretty excited to see all the projects on Xapian ideas page that I
could identify with.

To summarize, in the project, I used libxml++ to parse the wiki dump.  I
built an index of words (using multi-way merge) along with its posting list
in the decreasing order of TF-IDF. And then built a secondary index on top
of it for fast retrieval. To search for multiword queries, I used a simple Σ
tf-idf ranking system.

I would like to apply for GSoC 2014 as well and Xapian seems a great place
to learn more and put in practice the theories I am learning in my
Information Retrieval and Extraction course.

How would you suggest I should proceed?

Thanks.
Nikhar
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20140214/a76675fc/attachment.html>


More information about the Xapian-devel mailing list