Boosted fields search in Python

James Aylett james-xapian at tartarus.org
Thu Aug 9 21:35:38 BST 2018


On 9 Aug 2018, at 10:09, Katja Abramova <katja.abramova at dimension.it> wrote:

> I need to do a search for a
> multi-word query in which particular fields are boosted - preferably at
> query time. That is, given a query like "the cat is lying on the mat" (with
> an OR operator, ignoring word positions but with stemming and stop words
> removed), I'd like to search for that query in both, say Title and Body of
> the documents but with Title field boosted to 4 and Body to 2.

Hi, Katja!

There are a few different things going on here, so I'll try to go through them one at a time.

Field searching in Xapian is generally done using prefixes; the practical example in our "getting started" guide discusses this, and has sample code in python. I'd read from the beginning, including the core concepts. (https://getting-started-with-xapian.readthedocs.io/).

It also shows how to use the QueryParser to split and stem user-inputted queries into Xapian Query objects. You'll want to set the default_prefix when you call QueryParser::parse_query (this is covered in the concepts section of the getting started guide: https://getting-started-with-xapian.readthedocs.io/en/latest/concepts/indexing/terms.html?highlight=default_prefix#fields-and-term-prefixes).

You'll end up with python that looks a little like this:

# Some code that sets up the queryparser (stemming, for instance).
# See the getting started guide for a complete example.
# ...

# S = Subject. Note that you can't use a keyword argument for default_prefix, so we have
# to provide the flags as well.
title_query = queryparser.parse_query(querystring, xapian.QueryParser.FLAG_DEFAULT, "S")

Then you need to use OP_SCALE_WEIGHT, as you've identified, to apply the different weightings to the queries parsed against the two fields.

weighted_title_query = xapian.Query(xapian.Query.OP_SCALE_WEIGHT, title_query, 4)

Finally you need to combine the two weighted queries. You can do this using OP_OR, which will rank higher a document where both the title and the body match. Alternatively, OP_MAX may work better (use whichever side ranks higher, which will probably be the higher-weighted one). Something like this:

final_query = xapian.Query(xapian.Query.OP_MAX, [weighted_title_query, weighted_body_query])

(Note that boosting title to 4 and body to 2 probably isn't better than just boosting title to 2 and leaving body at standard weighting. Of course if you have a more complex search structure going on then that may still make sense!)

Hope this helps!

J

-- 
 James Aylett, occasional troublemaker & project governance
 xapian.org




More information about the Xapian-discuss mailing list