[Xapian-devel] GSoC 2014

Olly Betts olly at survex.com
Fri Feb 28 12:43:53 GMT 2014

On Thu, Feb 27, 2014 at 01:11:24PM +0530, karthik iyer wrote:
> So my idea goes like this. Basically I have been working on Question
> Answering systems. I developed a QA system for "when" type questions (sorry
> I cant provide the source code at the moment because my paper is under
> review at SIGIR 2014). I used the part-of-speech and developed a weighted
> scoring system.
> Now I basically plan on developing a generic QA system which encompasses a
> large number of questions. The biggest drawback of my previous QA system
> was the lack of relevance measuring mechanism. I want to develop a
> relevance measure between a query and a sentence. I believe there already
> exist many relevance measuring codes but those relate a query to a
> document( as far as I know).

The term "document" is what the literature uses, but the mental image
that might conjure up of a multi-page printout with a staple through the
corner is misleading.  The "documents" being matched could be single

> To develop a relevance measure I need to take
> into consideration a large number of sentences and questions so that a
> generic feature set can be formed which will further be employed in my ML
> algorithm. This needs a huge dataset of documents which I dont have due to
> lack of any financial support. I was planning to use the AQUAINT 2 dataset
> but it costs $500 which i cannot afford.
> Now if I am successful at building a relevance measuring system between a
> query and a sentence then I will take into consideration only those
> sentences that are relevant. Then I will apply my scoring system to those
> sentences which will help me select the final answer sentence. In my
> previous project I got an efficiency of ~74% tested on 200 test queries. I
> believe that with a proper relevance measure I can cross the 90% mark.
> Please give your suggestions on my project idea. It would be very helpful.

The first concern I have is whether this is something we actually have
the skills to mentor.  I personally don't have any previous experience
of Question Answering systems - I don't know about the other mentors.

I'm also unclear where Xapian fits into the picture.

Are you talking about building this as a new feature for Xapian?

Or is it an framework or application built on top of Xapian?

Or is it a separate system entirely?


More information about the Xapian-devel mailing list