[Xapian-discuss] Does Xapian support retrieval optional?
Olly Betts
olly at survex.com
Mon Nov 3 05:17:47 GMT 2014
On Thu, Oct 30, 2014 at 11:37:15AM +0800, Lu Zhen wrote:
> I've been using Xapian for a while. But there is a scene I don't know
> whether supported already.
>
> Suppose:
> 1. Raw query: how to make pizza
> 2. Parsed query: how AND to AND make AND pizza
> 3. Documents:
> d1: how to make pizza at home
> d2: 3 ways to make pizza
> d3: make pizza in 4 easy steps
>
> Question:
> 1. During searching process, how to retrieve d2, d3 (although they don't
> contain "how to")?
Set the default operator in the QueryParser to OP_OR instead of OP_AND:
http://xapian.org/docs/apidoc/html/classXapian_1_1QueryParser.html#a2efe48be88c4872afec4bc963f417ea5
The default is actually OP_OR (for historical reasons, though this will
probably get changed at some point), so you're presumably currently
setting this to OP_AND explicitly.
Or you could set "how" and "to" as stopwords, but that fails your
second requirement below.
> 2. Even more, how to make sure the score of d1 is higher than d2 or d3
> (because d1 does contain "how to")?
OP_OR sums the weight contributions from each term present, so this
will generally be the case.
(Strictly speaking, if you want a 100% guarantee, you'll need to pick a
weighting scheme and parameters which will ensure that is always the
case - I think BM25 with default parameters doesn't give this, but
you'd probably have to create an artificial test case to see d1 not
rank higher so I wouldn't worry about it myself.)
Cheers,
Olly
More information about the Xapian-discuss
mailing list