[Xapian-tickets] [Xapian] #360: SynonymPostList always requires doclength if wdf is used

Xapian nobody at xapian.org
Wed Feb 26 01:45:21 GMT 2014


#360: SynonymPostList always requires doclength if wdf is used
---------------------+------------------------------
 Reporter:  richard  |             Owner:  olly
     Type:  defect   |            Status:  assigned
 Priority:  normal   |         Milestone:  1.2.18
Component:  Matcher  |           Version:  SVN trunk
 Severity:  minor    |        Resolution:
 Keywords:           |        Blocked By:
 Blocking:           |  Operating System:  All
---------------------+------------------------------
\
\
\
\
Changes (by olly):

 * status:  new => assigned
 * milestone:  1.3.3 => 1.2.18

\
\
\

Comment:

 OK, so I've implemented OP_MAX, but in my tests with the etext db and all
 the terms starting "th" it is actually slower than OP_SYNONYM (at least
 under BM25), so that's not a great fix.  OP_SYNONYM is faster than OP_OR
 in my tests, I think because the weight calculation doesn't require
 recursing all the subpostlists.

 We can skip fetching the doclength if the wdf we calculated <=
 doclength_lower_bound for the current subdatabase, and that's a cheap
 check which should help, so I've implemented that in r17882.  The other
 thing I can see that we can do relatively easily is handling the common
 case where OP_SYNONYM has only terms as subqueries and they're all
 different - I think in that case the estimated synonym wdf can't exceed
 the doclength.

 I've also committed OP_MAX (since I implemented it) in r17884.

 We should backport the doclength_lower_bound optimisation for 1.2.18 if it
 applies reasonably cleanly, so updating milestone to remind us to do that.
\
\
\

--
Ticket URL: <http://trac.xapian.org/ticket/360#comment:4>
Xapian <http://xapian.org/>
Xapian



More information about the Xapian-tickets mailing list