[Xapian-devel] New Idea on Ranking in IR

Parth Gupta parthg.88 at gmail.com
Fri Apr 1 10:18:28 BST 2011


Hello,

I want to discuss my idea on ranking in IR system which I think can be good
extension to Xapian. If I am not too late to discuss it then please consider
it. I first give you brief background of me, I am a Masters student working
on my thesis in the Information Retrieval. I today only got a mail from one
of the professor from Europe whom i am going to join for Ph.D about GSoC and
more precisely Xapian.

Generally the ranking is unsupervised, where the rank list is produced based
on the score provided by the ranking function. Ranking functions are
unsupervised like BM25, TF-IDF and so on. So we give the rank list in the
dercreasing order of the score.

Well learning to rank involves supervised learning. If we can extract
features for a query and intial retrieval of documents pairs then we can say
which document should come above which. Basically search engine requires
relevant documents in top order, because user gnerally never bothers to
click on the next page of the retrieval rether he chooses to modify the
query.

In Laarning to Rank (Letor) we prepare the features which can represent a
query document pair. So now after the initial retrieval we take say first 20
or 30 documents and represent them in form of feature vactors, now based on
the training data our supervised leaning will give a score to each document
for a particular query. For example if this learning is from regression then
we have to learn 'W' vector which will give a score to the document vector
by dot product.

Here the features can be term frequency, TF-IDF score, BM25 Score etc, as
good as many. For Learning there are many machine learning techniques
available.

Regards,
Parth Gupta,
M.Tech Candidate,
DA-IICT, Gandhinagar,
India.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20110401/f42d639b/attachment.htm>


More information about the Xapian-devel mailing list