[Xapian-devel] [GSoC2012] Learning to Rank: few thoughts/issues

Wed Apr 4 09:27:17 BST 2012

Hi Parth,

Yes, I'll be using the features learnt using deep learning techniques will
be in addition to the traditional IR features and all of these
features(traditional+deep learning based features learnt in unsupervised
manner) will be fed to the ListMLE algorithm(which I will be implementing).
The Deep Learning algorithms are able to capture, to some extent, the
underlying generative factors that explain the variations in the input
data, and thus the learned representations help in disentangling the
underlying factors of variation. Such kind of features have been used in
this ICML'11 paper<http://eprints.pascal-network.org/archive/00008597/01/342_icmlpaper.pdf>for
domain adaptation in text articles.

I discussed the same setting (as I proposed in my last mail) with Hang
Li<http://research.microsoft.com/en-us/people/hangli/>sir(the author
of various Learning to Rank algorithms) and he says:
"*To me, ListMLE is a very elegant model and since it is a log linear
model, it appears to have a good match with deep learning.*"
[Parth: I have forwarded the mail to you for your kind reference.]

To sum up, in addition to the a subset of 134 of MSR's LETOR features,
we'll use the unsupervised deep learning features to train our model based
on ListMLE algorithm. As each of this would be implemented in a modular
fashion, if we get bad results(which shouldn't happen), we'll eliminate
that module(deep learning one) and keep the remaining; if we get better
results(which we should), we'll have a research contribution to make to the
community because Deep Learning techniques haven't been applied to ranking
systems yet.

Regards,
Rishabh.

On Wed, Apr 4, 2012 at 1:19 PM, Parth Gupta <parthg.88 at gmail.com> wrote:

> Hi Rishabh,
>
> If I have understood correctly ( thought I am not sure), do you want to
> extract features using some of the deep learning techniques in the
> semi-supervised setting of Letor. Can you give me some idea about what
> these features are?
>
> Are you planning to implement ListMLE in addition to this?
>
> Parth.
>
>
> On Mon, Apr 2, 2012 at 10:03 PM, Rishabh Mehrotra <erishabh at gmail.com>wrote:
>
>> Thanks Parth for your inputs. I had gone through the 134 or so feature
>> list at MSR's LETOR site for the ranking algorithms. The traditional IR
>> features I was talking about in my previous mail were referring to those
>> ones.
>>
>> Thanks for the heads-up on the unsupervised features part. I will be more
>> inclined on using one of the recently famous *deep learning architectures
>> * for unsupervised feature extraction simply because they have been
>> shown to outperform hand-crafted feature based state-of-the-art algorithms.
>> PFA a snapshot<Deep Learning.png> from one of the tutorials by Andrew Ng
>> highlighting the same fact.
>>
>> I would like to discuss about the probable methodology regarding the same.
>>
>> *Methodology:*
>> In order to do unsupervised feature extraction from the documents, I'll
>> use Denoising Stacked Autoencoders. Its a 2 stage process:
>>
>>    - Unsupervised pre-training
>>    - Supervised fine tuning
>>
>> *Unsupervised pre-training:*
>>
>> The denoising autoencoders[link<http://ufldl.stanford.edu/wiki/index.php/Autoencoders_and_Sparsity>]
>> can be stacked to form a deep network by feeding the latent representation
>> (output code) of the denoising auto-encoder found on the layer below as
>> input to the current layer. The *unsupervised pre-training* of such an
>> architecture is done one layer at a time. Each layer is trained as a
>> denoising auto-encoder by minimizing the reconstruction of its input (which
>> is the output code of the previous layer). Once the first *n* layers are
>> trained, we can train the *n+1-th* layer because we can now compute the
>> code or latent representation from the layer below.
>> The kind of representation learnt by this step is a rich representation
>> which has shown to be better than hand-crafted features/representations.
>>
>>
>> *Supervised fine tuning:*
>>
>> Once all layers are pre-trained, the network goes through a second stage
>> of training called fine-tuning. Generally we consider supervised
>> fine-tuning where we want to minimize prediction error on a supervised
>> task.
>> *For our problem statement(ranking):* we would use the features
>> generated in the unsupervised pre-training stage and add the previously
>> used IR features(a subset of the 136 LETOR features) and feed this to
>> ListMLE or other Learning to Rank algorithm.
>>
>> To sum up, we would perform unsupervised pre-training to get features for
>> all the documents(unlabeled as well as labeled) and then proceed to do
>> supervised fine tuning only on the labeled data/documents. This way we
>> would use both labelled and unlabeled data to learn features to represent
>> documents. I have worked a bit on a related project, so the integration of
>> such a combination (Unsupervised feature extraction+ListMLE) into xapian
>> shouldn't be a tough asking(hopefully) as a GSoC project.
>>
>>
>> Your inputs are welcome. Thanks for your time.
>>
>> -
>> Rishabh.
>>
>> PS: A lot of hot-research is going on in the Deep Learning field. Here's
>> a link <http://deeplearningworkshopnips2011.wordpress.com/> to a
>> NIPS'11(Tier-1 in ML) workshop on the same for everyone's reference.
>>
>>
>> On Mon, Apr 2, 2012 at 3:52 PM, Parth Gupta <parthg.88 at gmail.com> wrote:
>>
>>> Hello Rishabh,
>>>
>>> Good to hear from you. Its never late to jump-in for GSoC.
>>>
>>>  *Doubt1:*
>>>>
>>>> *Feature Extraction/Selection:*
>>>> The various datasets listed on MSR's LETOR have a limited set of
>>>> features. Current implementation in xapian's LETOR has 5
>>>> features[tf,idf,doc_len,coll_tf,coll_len]. While algorithms for learning
>>>> ranking models have been intensively studied, this is not the case for
>>>> feature selection, despite of its importance. In a paper presented at
>>>> SIGIR'07 [Tier1 in IR domain], the authors have highlighted the
>>>> effectiveness of feature selection methods for ranking tasks.[link<http://research.microsoft.com/en-us/people/tyliu/fsr.pdf>]
>>>> I believe that apart from the traditional/cliched IR features, we should
>>>> * incorporate new features* to improve the performance of the LETOR
>>>> module.
>>>>
>>>>
>>> There is no point denying the fact that there is a need for more
>>> features. If you have noticed on the GSoC idea page of Letor, it says "The
>>> project can also include some work on the features, like adding support for
>>> more features, selecting a subset of features, etc." Now the point comes,
>>> which features you want to incorporate. The Letor datasets are growing
>>> enormously in terms of number of features [Letor MSR 46 ->136 , Yahoo
>>> Dataset 700]. It would make sense to incorporate those features which can
>>> be tracked and suits the environment. More over majority of the features
>>> dwell around the IR measures like bm25, TF, IDF, LM and different
>>> combination of them for different part of the document. Some of the other
>>> features of Letor Datasets are number of outgoing links, number of incoming
>>> links, page rank, number of children [1,2]. These features are valid and
>>> available only in the linked data and moreover, straight forward to
>>> compute. Yahoo dataset does not even declare the features because of the
>>> proprietary issues. But I think it also includes some features like the age
>>> of the page, number of clicks on it, total time spent, and so on.
>>>
>>> [1]
>>> http://research.microsoft.com/en-us/um/beijing/projects/letor/LETOR4.0/Data/Features_in_LETOR4.pdf
>>> [2] http://research.microsoft.com/en-us/projects/mslr/feature.aspx
>>>
>>>>
>>>> *Using unlabeled data:*
>>>> Over the last 3-4 years a lot of papers have identified the importance
>>>> of using unlabeled data to assist the task at hand by using it during
>>>> feature extraction stage. Andrew Ng proposed a Self-Taught learning
>>>> framework[ICML'07 paper<http://ai.stanford.edu/%7Ehllee/icml07-selftaughtlearning.pdf>]
>>>> wherein they make use of unlabeled data to improve performance. A very
>>>> recent paper at ICML'11<http://eprints.pascal-network.org/archive/00008597/01/342_icmlpaper.pdf>used the advantage of feature learning using unlabeled data and beat the
>>>> state-of-the-art in sentiment classification.
>>>>
>>>> Combining the above two points, I suggest an approach which uses
>>>> features learnt from data in an unsupervised fashion "*in addition to*"
>>>> the commonly used features.
>>>> *Please note:* all this is in addition to the traditional features and
>>>> finally we would be using *listwise/pairwise approaches*[ListMLE, et
>>>> cetera] to train our models on the new set of features. Please let me know
>>>> if this sounds good.
>>>>
>>>>
>>> This phenomenon, Semi-supervised ranking, is indeed interesting. If you
>>> want to incorporate it, feel free to discuss the plan.
>>>
>>>
>>>> *Doubt2:*
>>>>
>>>> *Rank Aggregation:*
>>>> Now that Xappian will have >1 Learning to rank algorithms, we should
>>>> look into some kind of rank aggregation as well: combining outputs by
>>>> various algorithms to get a final rank ordering for results. I went though
>>>> a ECML'07 paper on unsupervised method for the same[link<http://l2r.cs.uiuc.edu/%7Edanr/Papers/KlementievRoSm07.pdf>].
>>>> I haven't yet completely understood their approach but will do so by the
>>>> end of day.
>>>>
>>>>
>>> Rank Aggregation, is another LTR approach with a set of ranked lists at
>>> hand for the query. At the moment Xapian can have 2 ranked list, BM25 and
>>> SVM based LTR scheme. I think these techniques will produce better results
>>> with the input of more number of ranked list than Xapian can offer at the
>>> moment. But it would be interesting to explore after some more ranking
>>> schemes incorporation.
>>>
>>>
>>>>
>>>> *Modularity:*
>>>> Developing such modules in a modular fashion such that its not
>>>> necessary to use all of them all the times, would be good. Whenever the
>>>> user feels that in addition to basic features, he/she could use additional
>>>> features, the feature extraction module could be plugged in. Same for rank
>>>> aggregation.
>>>>
>>>
>>> Agreed and that in fact, that will be the goal.
>>>
>>> Best,
>>> Parth.
>>>
>>>
>>>> *Relevant Background:*
>>>> I have worked on few research oriented projects in Machine Learning,
>>>> but most of them involved coding in Matlab/Java. More details about me: [
>>>> link <http://www.rishabhmehrotra.com/index.htm>].
>>>> I have been working on a project on Topic Modeling(using Latent
>>>> Dirichlet Allocation) for Tweets. Link<http://code.google.com/p/tweettrends/>of the code on Google code. Also, I am involved in a collage project on
>>>> building *focused crawler *& extending it to something like NELL<http://rtw.ml.cmu.edu/rtw/><far-fetched
>>>> dream as of now :) >.[Google code link<http://code.google.com/p/bits-crawler/source/browse/>
>>>> ]
>>>>
>>>> Please let me know how you feel about the above mentioned points
>>>> [and/or if I am way off the track].
>>>>
>>>> Best,
>>>> Rishabh.
>>>>
>>>> _______________________________________________
>>>> Xapian-devel mailing list
>>>> Xapian-devel at lists.xapian.org
>>>> http://lists.xapian.org/mailman/listinfo/xapian-devel
>>>>
>>>>
>>>
>>
>>
>> --
>> Rishabh.
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20120404/6b4ef937/attachment-0001.htm>