[Xapian-tickets] [Xapian] #360: SynonymPostList always requires doclength if wdf is used
Xapian
nobody at xapian.org
Wed Feb 26 01:45:21 GMT 2014
#360: SynonymPostList always requires doclength if wdf is used
---------------------+------------------------------
Reporter: richard | Owner: olly
Type: defect | Status: assigned
Priority: normal | Milestone: 1.2.18
Component: Matcher | Version: SVN trunk
Severity: minor | Resolution:
Keywords: | Blocked By:
Blocking: | Operating System: All
---------------------+------------------------------
\
\
\
\
Changes (by olly):
* status: new => assigned
* milestone: 1.3.3 => 1.2.18
\
\
\
Comment:
OK, so I've implemented OP_MAX, but in my tests with the etext db and all
the terms starting "th" it is actually slower than OP_SYNONYM (at least
under BM25), so that's not a great fix. OP_SYNONYM is faster than OP_OR
in my tests, I think because the weight calculation doesn't require
recursing all the subpostlists.
We can skip fetching the doclength if the wdf we calculated <=
doclength_lower_bound for the current subdatabase, and that's a cheap
check which should help, so I've implemented that in r17882. The other
thing I can see that we can do relatively easily is handling the common
case where OP_SYNONYM has only terms as subqueries and they're all
different - I think in that case the estimated synonym wdf can't exceed
the doclength.
I've also committed OP_MAX (since I implemented it) in r17884.
We should backport the doclength_lower_bound optimisation for 1.2.18 if it
applies reasonably cleanly, so updating milestone to remind us to do that.
\
\
\
--
Ticket URL: <http://trac.xapian.org/ticket/360#comment:4>
Xapian <http://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list