[Xapian-devel] Proposal Outline
Mayank Chaudhary
mayankchaudhary.iitr at gmail.com
Wed Mar 12 21:47:34 GMT 2014
Well, I found out one interesting graph in the paper "Feature selection for
Ranking".
[image: Inline image 3]
The evaluation metric(MAP) value just became double when the number of
features are reduced from 15 to 1. It's really motivating to implement the
feature selection algorithm. I was wondering if we could've more features
to add to feature vector so that we may have a considerable number of
features to select from.
On Wed, Mar 12, 2014 at 1:10 AM, Mayank Chaudhary <
mayankchaudhary.iitr at gmail.com> wrote:
> Hi,
>
> Before starting my proposal, I wanted to know what is the expected output
> of Letor module. Is it for transfer learning (i.e you learn from one
> dataset and leverage it to predict the rankings of other dataset) or is it
> for supervised learning?
>
> For instance - Xapian currently powers the Gmane search which is by
> default based on BM25 weighting scheme and now suppose we want to use LETOR
> to rank the top k retrieved search results, lets take SVMRanker for an
> example, will it rank the Gmane's search results based on the weights
> learned from INEX dataset because the client won't be providing any
> training file. And also I don't think it'll perform good for two datasets
> of different distributions. So how are we going to use it?
>
> PROPOSAL-
>
> 1.Sorting out Letor API will include -
>
> - Implementing SVMRanker and checking its evaluation results against
> the already generated values.
>
>
> - Implementing evaluation methods. Those methods will include MAP and
> NDCG. (*Is there any other method in particular that can be
> implemented other than these two?*)
>
>
> - Check the performance of ListMLE and ListNet against SVMRanker.(*Considering
> both ListMLE and ListNet has been implemented correctly but we don't have
> any tested performance measurement of these two algorithms*. *Therefore
> I want to know what should be course of action for this?*)
>
>
> - Implementing Rank aggregator. I've read about *Kemmy-Young Method*.
> Can you provide me with the names of the algorithms based on what should be
> implemented here or what was proposed last-to-last year. Also is there a
> way to check any ranker's performance(*since INEX dataset doesn't
> provide ranking*).
>
> 2. Implementing automated tests will include -
>
> - For testing, 20 documents and 5 queries can be picked from the INEX
> dataset, put to test and checked against their expected outputs.
>
>
> - Implemented evaluation metrics can also be used to test learning
> algorithms.
>
> 3.Implementing a feature selection algorithms-
>
> - I have a question here. Why are we planning to implement feature
> selection algorithm when we have only 19 features vectors. I don't think
> it'll over-fit the dataset. Also from what I have learnt, feature selection
> algorithms(like PCA in classification) are used only for time or space
> efficiencies.
>
> Please do provide some feedback so that I can improve upon it.
>
> -Mayank
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20140313/8e1ba2f4/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screenshot from 2014-03-13 03:03:11.v01.png
Type: image/png
Size: 28212 bytes
Desc: not available
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20140313/8e1ba2f4/attachment-0001.png>
More information about the Xapian-devel
mailing list