xapian-letor: FeatureVector discussion

Ayush Tomar ayushtomar at gmail.com
Mon Jun 27 14:49:15 BST 2016


Hi Parth,

James might have something to say on the second approach. It wasn't
discussed in detail and I don't completely understand how things will work
here without having some sort of serialisation.

On Mon, Jun 27, 2016 at 6:08 PM, Parth Gupta <pargup8 at gmail.com> wrote:

> Hi Ayush
>
> Thanks for bringing up the issue for discussion. It is still possible to
> use feature IDs with Enums without the order. It is just we are defining in
> a way we need. Usually a good approach is to group features with some
> similarity e.g. term-document scores based features such as BM25 score, LM
> score etc are in a separate group with a specific ID range. The addition of
> new features can extend the present range or can be accommodated in the
> present range.
>
> The rankers will rank a particular instance with the present features (not
> necessarily, all and in order). In fact, a user can specify which features
> s/he wants to work with and the feature manager will ensure calculation of
> them and update 'fvals'.
>
> I am still missing some bits on the second approach, can you please give a
> little more information on it?
>
> Cheers
> Parth
>
>
> On Mon, Jun 27, 2016 at 5:46 PM, Ayush Tomar <ayushtomar at gmail.com> wrote:
>
>> Hello James, Parth,
>>
>> Following our discussion on IRC and on code review, the way FeatureVector
>> class works needs some discussion.
>>
>> Presently, the FeatureVector class is defined as follows, with a fixed
>> number of feature count (19):
>>
>> class FeatureVector::Internal : public Xapian::Internal::intrusive_base{
>>     friend class FeatureVector;
>>     double label;
>>     double score;
>>     std::map<int,double> fvals;
>>     int fcount;
>>     Xapian::docid did;
>>
>> The two approaches that were discussed were:
>> 1. Using enums as IDs for features in fvals.
>> 2. Making fvals into a configurable vector of feature values.
>>
>> The issues were that the first way would still assume an order in which
>> the features occur, and the second way would require the feature generation
>> code to be changed into lots of little classes, which might be an overhead
>> right now but would be a good functionality to have in future.
>>
>> What would be the best approach here?
>> --
>>
>> ----------------------------------------------------------------------------
>> Kind Regards,
>> Ayush Tomar | My Webpage <http://ayshtmr.xyz> | LinkedIn
>> <https://in.linkedin.com/in/ayushtomar>
>>
>
>


-- 
----------------------------------------------------------------------------
Kind Regards,
Ayush Tomar | My Webpage <http://ayshtmr.xyz> | LinkedIn
<https://in.linkedin.com/in/ayushtomar>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20160627/8d6e9f8e/attachment-0001.html>


More information about the Xapian-devel mailing list