[Xapian-tickets] [Xapian] #400: Optimise AND_MAYBE when the RHS has a maxweight of 0

Xapian nobody at xapian.org
Tue Dec 3 06:47:28 GMT 2019


#400: Optimise AND_MAYBE when the RHS has a maxweight of 0
-----------------------------+-------------------------------
 Reporter:  Richard Boulton  |             Owner:  Olly Betts
     Type:  enhancement      |            Status:  assigned
 Priority:  normal           |         Milestone:  1.5.0
Component:  Matcher          |           Version:  git master
 Severity:  minor            |        Resolution:
 Keywords:                   |        Blocked By:
 Blocking:                   |  Operating System:  All
-----------------------------+-------------------------------
Changes (by Olly Betts):

 * status:  new => assigned
 * version:  SVN trunk => git master
 * milestone:  1.4.x => 1.5.0

Comment:

 The case where the scale factor is zero (which is what the testcases in
 the patch test) has been handled since 1.4.10
 (9e1023ab5d28532e649715754d5f000038e98f2f) - I tested with the new
 testcases applied to git master and they passes.  This optimisation
 doesn't cause the problem with percentages highlighted above since we
 don't count subqueries for which factor == 0.

 We don't currently handle the case where the maxweight of the RHS is zero
 or becomes zero, but I think that's quite easy to do:

 {{{#!diff
 diff --git a/xapian-core/api/queryinternal.cc b/xapian-
 core/api/queryinternal.cc
 index c5148ca350e0..4c888d8b0af7 100644
 --- a/xapian-core/api/queryinternal.cc
 +++ b/xapian-core/api/queryinternal.cc
 @@ -2357,10 +2357,18 @@ QueryAndMaybe::postlist(QueryOptimiser * qopt,
 double factor) const
      }
      OrContext ctx(qopt, subqueries.size() - 1);
      do_or_like(ctx, qopt, factor, 0, 1);
 +    Xapian::termcount save_total_subqs = qopt->get_total_subqs();
      unique_ptr<PostList> r(ctx.postlist());
      if (!r.get()) {
         RETURN(l.release());
      }
 +    if (r->recalc_maxweight() == 0.0) {
 +       // The RHS can't contribute any weight, so can be discarded.
 Reset
 +       // total_subqs in case we counted any in the RHS so that
 percentages
 +       // don't get messed up.
 +       qopt->set_total_subqs(save_total_subqs);
 +       RETURN(l.release());
 +    }
      RETURN(new AndMaybePostList(l.release(), r.release(),
                                 qopt->matcher, qopt->db_size));
  }
 }}}

 I'm not sure if we actually need to restore total_subqs - it seems there
 probably can't be any weighted terms in the RHS if its overall maxweight
 is zero, but maybe with a custom weighting scheme there could be.

 Let's try to get test coverage for the above and apply for 1.5.0.  I don't
 think this additional case is worth patching 1.4.x for.

 And I think we can not worry about the case where the max weight starts
 off non-zero but becomes zero during the match - in such a situation it
 would be more obvious to implement the !PostingSource to simply signal it
 has reached its end rather than just setting its maxweight to zero, and
 that will be handled efficiently already.
-- 
Ticket URL: <https://trac.xapian.org/ticket/400#comment:4>
Xapian <https://xapian.org/>
Xapian


More information about the Xapian-tickets mailing list