[Xapian-discuss] Does Xapian support retrieval optional?

Olly Betts olly at survex.com
Mon Nov 3 05:17:47 GMT 2014


On Thu, Oct 30, 2014 at 11:37:15AM +0800, Lu Zhen wrote:
> I've been using Xapian for a while. But there is a scene I don't know
> whether supported already.
> 
> Suppose:
> 1. Raw query: how to make pizza
> 2. Parsed query: how AND to AND make AND pizza
> 3. Documents:
>     d1: how to make pizza at home
>     d2: 3 ways to make pizza
>     d3: make pizza in 4 easy steps
> 
> Question:
> 1. During searching process, how to retrieve d2, d3 (although they don't
> contain "how to")?

Set the default operator in the QueryParser to OP_OR instead of OP_AND:

http://xapian.org/docs/apidoc/html/classXapian_1_1QueryParser.html#a2efe48be88c4872afec4bc963f417ea5

The default is actually OP_OR (for historical reasons, though this will
probably get changed at some point), so you're presumably currently
setting this to OP_AND explicitly.

Or you could set "how" and "to" as stopwords, but that fails your
second requirement below.

> 2. Even more, how to make sure the score of d1 is higher than d2 or d3
> (because d1 does contain "how to")?

OP_OR sums the weight contributions from each term present, so this
will generally be the case.

(Strictly speaking, if you want a 100% guarantee, you'll need to pick a
weighting scheme and parameters which will ensure that is always the
case - I think BM25 with default parameters doesn't give this, but
you'd probably have to create an artificial test case to see d1 not
rank higher so I wouldn't worry about it myself.)

Cheers,
    Olly



More information about the Xapian-discuss mailing list