Omega: Missing support for newer weighting schemes

Olly Betts olly at survex.com
Thu Apr 13 02:01:16 BST 2017


On Mon, Apr 10, 2017 at 11:47:36PM +0530, Vivek Pal wrote:
> > No, use Xapian::Registry to find the weighting scheme from the name
> > like how Weight::unserialise() does (otherwise every caller would need
> > code similar to that above).
> 
> Okay, I looked into Xapian::Registry and it seems you are referring to using
> the get_weighting_scheme method? (which expects a string e.g. "bm25" i.e. the
> name of a weighting scheme.)

Yes, you pass it a string and it gives you the registered object with that
name.

> And, Weight::unserialise() looks like throwing a Xapian::UnimplementedError
> error. I wonder if you meant the same method?

That's the default implementation - each subclass overrides that with an
actual implementation (at least if it wants to work with remote databases).

> > The code in omega would just be:
> >
> >   enq.set_weighting_scheme(Xapian::Weight::parse_params(scheme));
> 
> I wonder if we could do something more like:
> 
> enq.set_weighting_scheme(Xapian::Registry::get_weighting_scheme(name).parse_params(params));
> 
> where, Xapian::Registry::get_weighting_scheme(name) returns a weighting scheme
> object e.g. BM25Weight object and then calling parse_params method on that
> to return a BM25Weight object now with parameters values as found in params
> string.

That doesn't work as written because you need a registry object to call
get_weighting_scheme() on (it's not a static method).

But the idea is to have a static method of Xapian::Weight to do
essentially that.  It probably needs to take an optional Registry object
like Query::unserialise() does so the user can add their own custom
weighting schemes in:

    static const Query unserialise(const std::string & serialised,
                                   const Registry & reg = Registry());

> It should work given that we first separate the "scheme" string into two
> substrings namely "name" and "params" as used above.

That ought to be handled by parse_params().

> Also, we'd check if params is not null e.g. in the case of scheme = "bm25"
> and then we'd just call:
> 
> enq.set_weighting_scheme(Xapian::Registry::get_weighting_scheme(name));

This shouldn't be necessary, though it does avoid pointlessly cloning the
weight object from the registry in the case of an empty parameter string.
It means all weighting schemes would need to support being specified with
no parameters - I guess there's probably always a sane default set of
parameters.

> Also, regarding Xapian::Weight::parse_params(scheme) -- I thought we were to
> define parse_params method in each Weight subclass e.g. BM25Weight, LMWeight
> etc.?

Sorry, I was misusing that name for the static "dispatcher" method as well
as the virtual method implementing the parameter parsing.

> > There's probably a much better name than parse_params() though.
> 
> Yes, I agree. May be get_parameterized_weighting_scheme sound better? (if it
> isn't too long to read :))

That makes it sound like a getter, whereas it's really a factory method.

Also we use British English spellings in the API so it'd be "parameterised"
(we could provide inline aliases, though we haven't for other cases such as
"serialise" and "miles_to_metres" and nobody has moaned so far).

Cheers,
    Olly



More information about the Xapian-devel mailing list