[Xapian-devel] GSoC 2014

karthik iyer karthikiyer2000 at gmail.com
Thu Feb 27 07:41:24 GMT 2014


So my idea goes like this. Basically I have been working on Question
Answering systems. I developed a QA system for "when" type questions (sorry
I cant provide the source code at the moment because my paper is under
review at SIGIR 2014). I used the part-of-speech and developed a weighted
scoring system.
Now I basically plan on developing a generic QA system which encompasses a
large number of questions. The biggest drawback of my previous QA system
was the lack of relevance measuring mechanism. I want to develop a
relevance measure between a query and a sentence. I believe there already
exist many relevance measuring codes but those relate a query to a
document( as far as I know). To develop a relevance measure I need to take
into consideration a large number of sentences and questions so that a
generic feature set can be formed which will further be employed in my ML
algorithm. This needs a huge dataset of documents which I dont have due to
lack of any financial support. I was planning to use the AQUAINT 2 dataset
but it costs $500 which i cannot afford.
Now if I am successful at building a relevance measuring system between a
query and a sentence then I will take into consideration only those
sentences that are relevant. Then I will apply my scoring system to those
sentences which will help me select the final answer sentence. In my
previous project I got an efficiency of ~74% tested on 200 test queries. I
believe that with a proper relevance measure I can cross the 90% mark.
Please give your suggestions on my project idea. It would be very helpful.


On Wed, Feb 26, 2014 at 5:16 PM, Parth Gupta <pargup8 at gmail.com> wrote:

> The Letor project involves descent amount of Machine Learning while all
> the ranking related projects are around IR. Its better to introduce your
> idea on mailing list where all the mentors can have a detailed look at it,
> potential mentors can respond and the idea is kind of registered under your
> name.
> Cheers,
> Parth.
> On Wed, Feb 26, 2014 at 10:20 AM, Olly Betts <olly at survex.com> wrote:
>> On Tue, Feb 25, 2014 at 03:58:09PM +0530, karthik iyer wrote:
>> >     I am C Karthik Iyer, a 3rd year B Tech student at NITK Surathkal. I
>> am
>> > interested in working on projects on Information Retrieval and Machine
>> > Learning. I've had previous experience on working on projects regarding
>> > Question Answering Systems.
>> >     I have a project idea which includes both IR and ML but i dont know
>> how
>> > feasible the idea is. Could you guys say when will you be available on
>> IRC
>> > so that I can discuss the idea with you.
>> I can't say for certain when I'll be monitoring IRC, but I'm in UTC+13.
>> Other mentors are in a variety of timezones.
>> If the idea is complex, email might be better though.
>> Cheers,
>>     Olly
>> _______________________________________________
>> Xapian-devel mailing list
>> Xapian-devel at lists.xapian.org
>> http://lists.xapian.org/mailman/listinfo/xapian-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20140227/b0c1c611/attachment-0001.html>

More information about the Xapian-devel mailing list