<div dir="ltr"><div><div>Hi,<br><br>Well frankly speaking even I am not too sure how this will fit into Xapian toolkit. I thought you guys will help me with that. I havent gone through the Xapian documentation properly yet. I will go through it and reply back in a couple of days as to how can my idea (if possible) fit into your toolkit.<br>

<br></div>Regards<br></div>Karthik <br></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Fri, Feb 28, 2014 at 6:13 PM, Olly Betts <span dir="ltr"><<a href="mailto:olly@survex.com" target="_blank">olly@survex.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="">On Thu, Feb 27, 2014 at 01:11:24PM +0530, karthik iyer wrote:<br>

> So my idea goes like this. Basically I have been working on Question<br>

> Answering systems. I developed a QA system for "when" type questions (sorry<br>

> I cant provide the source code at the moment because my paper is under<br>

> review at SIGIR 2014). I used the part-of-speech and developed a weighted<br>

> scoring system.<br>

> Now I basically plan on developing a generic QA system which encompasses a<br>

> large number of questions. The biggest drawback of my previous QA system<br>

> was the lack of relevance measuring mechanism. I want to develop a<br>

> relevance measure between a query and a sentence. I believe there already<br>

> exist many relevance measuring codes but those relate a query to a<br>

> document( as far as I know).<br>

<br>

</div>The term "document" is what the literature uses, but the mental image<br>

that might conjure up of a multi-page printout with a staple through the<br>

corner is misleading.  The "documents" being matched could be single<br>

sentences.<br>

<div class=""><br>

> To develop a relevance measure I need to take<br>

> into consideration a large number of sentences and questions so that a<br>

> generic feature set can be formed which will further be employed in my ML<br>

> algorithm. This needs a huge dataset of documents which I dont have due to<br>

> lack of any financial support. I was planning to use the AQUAINT 2 dataset<br>

> but it costs $500 which i cannot afford.<br>

> Now if I am successful at building a relevance measuring system between a<br>

> query and a sentence then I will take into consideration only those<br>

> sentences that are relevant. Then I will apply my scoring system to those<br>

> sentences which will help me select the final answer sentence. In my<br>

> previous project I got an efficiency of ~74% tested on 200 test queries. I<br>

> believe that with a proper relevance measure I can cross the 90% mark.<br>

> Please give your suggestions on my project idea. It would be very helpful.<br>

<br>

</div>The first concern I have is whether this is something we actually have<br>

the skills to mentor.  I personally don't have any previous experience<br>

of Question Answering systems - I don't know about the other mentors.<br>

<br>

I'm also unclear where Xapian fits into the picture.<br>

<br>

Are you talking about building this as a new feature for Xapian?<br>

<br>

Or is it an framework or application built on top of Xapian?<br>

<br>

Or is it a separate system entirely?<br>

<br>

Cheers,<br>

    Olly<br>

</blockquote></div><br></div>