[Xapian-devel] A Little Help

Parth Gupta parthg.88 at gmail.com
Tue Jul 31 04:59:09 BST 2012


Hi Rishabh,

Actually, storing those four variables into a training file is in my
opinion the best way to go ahead and producing a training file in the end.
This training file can be generated in some directory lets say /etc or /var
in xapian-letor.

Usually these training files are used to train the model and the
corresponding model is saved.

I understand your point to store the information into a data structure but
I am afraid that it will become a lot difficult to process the information
outside the API, say analysis of features, data etc..

So lets stick to the convention and use the files.

Parth.

On Fri, Jul 27, 2012 at 11:11 PM, Rishabh Mehrotra <erishabh at gmail.com>wrote:

> Hello Parth,
>
> Thanks for the reply. I had similar concerns regarding the user
> friendliness of the LETOR module but then I realized that we won't be
> returning this RankList to questletor, the RankList will stay inside Letor
> and will not figure anywhere in the user's code(/questletor). If  this
> holds true then the user-friendliness of our module stays intact.
>
> My requirements for making RankList recognized inside the Letor class are
> based on the following points:
>
>    - Its not letor_score() for which I was planning on using RankList in
>    Letor. Sorry, I should have made this clear in my previous mail. *
>    letor_score()* doesn't need to return a RankList. Even in the current
>    implementation it returns map<Xapian::docid, double>. So no issues here.
>
>
>    - As per the current implementation of prepare_training_file(), the
>    entire training data is read into a list<RankList> and then this
>    list<RankList> is to be saved to a file. This looks a bit complex as each
>    RankList in list<RankList> has a vector<FeatureVector> and each
>    FeatureVector has 4 associated variables which need to be saved on
>    file(this includes a map<int,double>). Saving all this nested information
>    seemed messy; I was a bit reluctant to go ahead with this, hence wanted to
>    confirm this with you.
>
> <Though we discussed this yesterday, but till then I hadn't looked into
> the exact nature of data that was required to be stored in the file.>
>
>
>    - *Possible solution:* If instead we create a list<RankList> *variable
>    as part of Letor class* then the prepare_training_file() method would
>    just update this variable and as long as we have an instance of the Letor
>    class alive, we would have this variable to use in subsequent operations.
>    Hence, we won't need to save the complex looking list<RankList> data to a
>    file and then read it back.
>
>
>    - We discussed on IRC yesterday that doing so would prevent users who
>    want to use their own training file. If we look at the possibilities, 2
>    cases arise:
>       - *User has a training file:* We take in the training file, update
>       Letor's list<RankList> variable using this file at the end of
>       prepare_training_file() function and proceed normally.
>       - *User doesn't have a training file:* If the user doesn't have a
>       training file then we would want to use an already existing training file
>       to do the training, which would require that we save the list<RankList>
>       somewhere. An alternative to this is that we could use the model learnt
>       from this data directly, that is, instead of saving this list<RankList> we
>       instead save the model parameters learnt using this data- which we anyways
>       do in save_model() function. Doing so eliminates the need for saving the
>       RankList for future use without any extra effort.
>
>
> *Problem with going ahead with this:*
> I donot know how to include the ranklist header file in
> xapian-letor/include/letor.h.
>
> Please let me know if I have overlooked some point with respect to the
> availability of training file and the feasibility/applicability of the
> solution.
>
> Regards,
> Rishabh.
>
> On Sat, Jul 28, 2012 at 12:50 AM, Parth Gupta <parthg.88 at gmail.com> wrote:
>
>> Hi Rishabh,
>>
>> I think its better not to expose RankiList to Letor.h and make it better
>> user friendly. So my suggestion is to convert RankList to the following
>> statement in this method.
>>
>> std::map<Xapian::docid, double> letor_score(const Xapian::MSet & mset);
>>
>> So just convert the RankList in std::map<Xapian::docid, double> format in
>> the methods where you need to return.
>>
>> Parth.
>>
>>
>> On Fri, Jul 27, 2012 at 5:06 PM, Rishabh Mehrotra <erishabh at gmail.com>wrote:
>>
>>> Hi,
>>> I had a little doubt: How do I make a RankList recognizable in Letor.h?
>>> *letor.h* resides in *xapian/xapian-letor/include/xapian/* whereas *
>>> ranklist.h* resides in *xapian/xapian-letor/*. I want a function in
>>> letor.cc to return a RankList so the function declaration in letor.h
>>> requires RankList to be recognized.
>>>
>>> Thanks.
>>> Rishabh.
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20120731/ef55b6ca/attachment.htm>


More information about the Xapian-devel mailing list