[Xapian-discuss] Does Xapian support retrieval optional?

Richard Boulton richard at tartarus.org
Mon Nov 3 08:48:10 GMT 2014


Are you actually seeing such slowness if you use OP_OR, or just worried
about it hypothetically?  Xapian has several optimisations that allow it to
process my OR queries nearly as fast as AND queries.

Merging the posting lists is done at the same time as calculating scores.

However, there is OP_AND_MAYBE which you can use to mark terms (on its
right hand side) as optional: ie, as not being required or sufficient to
make a document match but as affecting the weight.
On 3 Nov 2014 07:19, "Lu Zhen" <yifeng133 at gmail.com> wrote:

> Thanks for your reply.
>
> But if we change the default operator to "OP_OR", it would be much slower
> when facing lots of documents.
>
> I was wondering if exists a way to label some terms of the query
> "optional", so merging inverted lists would ignore these terms, but at
> ranking process(cacl scores), these terms do matter.
>
>
> 2014-11-03 13:17 GMT+08:00 Olly Betts <olly at survex.com>:
>
> > On Thu, Oct 30, 2014 at 11:37:15AM +0800, Lu Zhen wrote:
> > > I've been using Xapian for a while. But there is a scene I don't know
> > > whether supported already.
> > >
> > > Suppose:
> > > 1. Raw query: how to make pizza
> > > 2. Parsed query: how AND to AND make AND pizza
> > > 3. Documents:
> > >     d1: how to make pizza at home
> > >     d2: 3 ways to make pizza
> > >     d3: make pizza in 4 easy steps
> > >
> > > Question:
> > > 1. During searching process, how to retrieve d2, d3 (although they
> don't
> > > contain "how to")?
> >
> > Set the default operator in the QueryParser to OP_OR instead of OP_AND:
> >
> >
> >
> http://xapian.org/docs/apidoc/html/classXapian_1_1QueryParser.html#a2efe48be88c4872afec4bc963f417ea5
> >
> > The default is actually OP_OR (for historical reasons, though this will
> > probably get changed at some point), so you're presumably currently
> > setting this to OP_AND explicitly.
> >
> > Or you could set "how" and "to" as stopwords, but that fails your
> > second requirement below.
> >
> > > 2. Even more, how to make sure the score of d1 is higher than d2 or d3
> > > (because d1 does contain "how to")?
> >
> > OP_OR sums the weight contributions from each term present, so this
> > will generally be the case.
> >
> > (Strictly speaking, if you want a 100% guarantee, you'll need to pick a
> > weighting scheme and parameters which will ensure that is always the
> > case - I think BM25 with default parameters doesn't give this, but
> > you'd probably have to create an artificial test case to see d1 not
> > rank higher so I wouldn't worry about it myself.)
> >
> > Cheers,
> >     Olly
> >
>
>
>
> --
> 谢 谢
> 卢 振
> _______________________________________________
> Xapian-discuss mailing list
> Xapian-discuss at lists.xapian.org
> http://lists.xapian.org/mailman/listinfo/xapian-discuss
>


More information about the Xapian-discuss mailing list