[Xapian-devel] Proposal Outline

Wed Mar 12 21:47:34 GMT 2014

Well, I found out one interesting graph in the paper "Feature selection for
Ranking".

[image: Inline image 3]

The evaluation metric(MAP) value just became double when the number of
features are reduced from 15 to 1. It's really motivating to implement the
feature selection algorithm. I was wondering if we could've more features
to add to feature vector so that we may have a considerable number of
features to select from.

On Wed, Mar 12, 2014 at 1:10 AM, Mayank Chaudhary <
mayankchaudhary.iitr at gmail.com> wrote:

> Hi,
>
> Before starting my proposal, I wanted to know what is the expected output
> of Letor module. Is it for transfer learning (i.e you learn from one
> dataset and leverage it to predict the rankings of other dataset) or is it
> for supervised learning?
>
> For instance - Xapian currently powers the Gmane search which is by
> default based on BM25 weighting scheme and now suppose we want to use LETOR
> to rank the top k retrieved search results, lets take SVMRanker for an
> example, will it rank the Gmane's search results based on the weights
> learned from INEX dataset because the client won't be providing any
> training file. And also I don't think it'll perform good for two datasets
> of different distributions. So how are we going to use it?
>
> PROPOSAL-
>
> 1.Sorting out Letor API will include -
>
>    - Implementing SVMRanker and checking its evaluation results against
>    the already generated values.
>
>
>    - Implementing evaluation methods. Those methods will include MAP and
>    NDCG. (*Is there any other method in particular that can be
>    implemented other than these two?*)
>
>
>    - Check the performance of ListMLE and ListNet against SVMRanker.(*Considering
>    both ListMLE and ListNet has been implemented correctly but we don't have
>    any tested performance measurement of these two algorithms*. *Therefore
>    I want to know what should be course of action for this?*)
>
>
>    - Implementing Rank aggregator. I've read about *Kemmy-Young Method*.
>    Can you provide me with the names of the algorithms based on what should be
>    implemented here or what was proposed last-to-last year. Also is there a
>    way to check any ranker's performance(*since INEX dataset doesn't
>    provide ranking*).
>
> 2. Implementing automated tests will include -
>
>    - For testing, 20 documents and 5 queries can be picked from the INEX
>    dataset, put to test and checked against their expected outputs.
>
>
>    - Implemented evaluation metrics can also be used to test learning
>    algorithms.
>
> 3.Implementing a feature selection algorithms-
>
>    - I have a question here. Why are we planning to implement feature
>    selection algorithm when we have only 19 features vectors. I don't think
>    it'll over-fit the dataset. Also from what I have learnt, feature selection
>    algorithms(like PCA in classification) are used only for time or space
>    efficiencies.
>
> Please do provide some feedback so that I can improve upon it.
>
> -Mayank
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20140313/8e1ba2f4/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screenshot from 2014-03-13 03:03:11.v01.png
Type: image/png
Size: 28212 bytes
Desc: not available
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20140313/8e1ba2f4/attachment-0001.png>