Boosted fields search in Python

Katja Abramova katja.abramova at dimension.it
Thu Aug 9 10:09:43 BST 2018


Hi,

I'm using Xapian in Python2. I'm trying to replicate an analysis that
somebody else performed in Lucene. To do that I need to do a search for a
multi-word query in which particular fields are boosted - preferably at
query time. That is, given a query like "the cat is lying on the mat" (with
an OR operator, ignoring word positions but with stemming and stop words
removed), I'd like to search for that query in both, say Title and Body of
the documents but with Title field boosted to 4 and Body to 2.

I have seen an answer in FAQ (https://trac.xapian.org/wiki/FAQ/ExtraWeight)
but
1. it is not immediately clear to me how to translate the examples given to
Python
2. the examples are for boosting single terms, not fields and not
multiple-word queries

I have implemented a workaround that manually attaches prefixes to terms
and combines everything with an OR, like this:

subqueries = []
subqueries.extend([xapian.Query(xapian.Query.OP_SCALE_WEIGHT,
xapian.Query('S'+term), 4) for term in query_terms])
subqueries.extend([xapian.Query(xapian.Query.OP_SCALE_WEIGHT,
xapian.Query('XD'+term), 2) for term in query_terms])
query = xapian.Query(xapian.Query.OP_OR, subqueries)

However:
1. It seems overly complicated (and I'm not even sure is correct?)
2. I don't know how to access the terms from a parsed query other than
manually parsing and stemming the query string - is there a function for
that?

Thanks,
Katja


More information about the Xapian-discuss mailing list