[Xapian-tickets] [Xapian] #378: Optimise MultiAndPostList using current weight of lhs

Xapian nobody at xapian.org
Tue Apr 10 13:04:22 BST 2018


#378: Optimise MultiAndPostList using current weight of lhs
-------------------------+------------------------------
 Reporter:  richard      |             Owner:  richard
     Type:  enhancement  |            Status:  assigned
 Priority:  low          |         Milestone:  1.4.x
Component:  Matcher      |           Version:  SVN trunk
 Severity:  normal       |        Resolution:
 Keywords:               |        Blocked By:
 Blocking:               |  Operating System:  All
-------------------------+------------------------------
Description changed by ankitadixit:

Old description:

> Currently, MultiAndPostList::next() advances the sub-postlists until they
> all reach a common document ID, by moving the first postlist forward, and
> then calling check on all the subsequent postlists, with the document ID
> that first postlist has moved to.  The weights are not calculated until
> after the postlists have all settled on a position (so the w_min value is
> not used by MultiAndPostList - though it may be used by the sub-
> postlists).
>
> Instead, each time the lhs moves to a new position, the current weight of
> that sub-postlist could be calculated. If this weight is insufficient,
> when combined with the maxweights of the remaining sub-postlists, to
> attain w_min(), the lhs could be moved further forward, without calling
> check() on any of the subsequent posting lists; thus reducing the
> frequency at which the subsequent posting lists need to be accessed.  If
> the subsequent posting lists are "expensive" to check for some reason
> (perhaps they're external posting sources, or value ranges), this can
> speed things up.
>
> This principle could be used in AndMaybePostList too, and possibly to
> OrPostList.
>
> Speculatively, this idea could be extended, such that the skip_to() and
> check() methods of posting sources would be given an extra
> "current_min_weight" parameter, indicating the minimum weight required
> for the specified document id.  This would allow the optimisation to be
> applied outside within the individual sub-postlist trees of a
> MultiAndPostList.
>
> I've got a prototype patch to MultiAndPostList, and some timings for a
> particular query.  In most cases, I don't see a noticeable performance
> difference with the prototype patch, but in some cases, it results in a
> significant speed up.  On the other hand, (though I've not observed it),
> it's plausible that this patch could slow things down if calculating the
> weight on the lhs postlist is more expensive than calling check() on the
> rhs postlist.

New description:

 Currently, MultiAndPostList::next() advances the sub-postlists until they
 all reach a common document ID, by moving the first postlist forward, and
 then calling check on all the subsequent postlists, with the document ID
 that first postlist has moved to.  The weights are not calculated until
 after the postlists have all settled on a position (so the w_min value is
 not used by MultiAndPostList - though it may be used by the sub-
 postlists).

 Instead, each time the lhs moves to a new position, the current weight of
 that sub-postlist could be calculated. If this weight is insufficient,
 when combined with the maxweights of the remaining sub-postlists, to
 attain w_min(), the lhs could be moved further forward, without calling
 check() on any of the subsequent posting lists; thus reducing the
 frequency at which the subsequent posting lists need to be accessed.  If
 the subsequent posting lists are "expensive" to check for some reason
 (perhaps they're external posting sources, or value ranges), this can
 speed things up.

 This principle could be used in AndMaybePostList too, and possibly to
 OrPostList.

 Speculatively, this idea could be extended, such that the skip_to() and
 check() methods of posting sources would be given an extra
 "current_min_weight" parameter, indicating the minimum weight required for
 the specified document id.  This would allow the optimisation to be
 applied outside within the individual sub-postlist trees of a
 MultiAndPostList.

 I've got a prototype patch to MultiAndPostList, and some timings for a
 particular query.  In most cases, I don't see a noticeable performance
 difference with the prototype patch, but in some cases, it results in a
 significant speed up.  On the other hand, (though I've not observed it),
 it's plausible that this patch could slow things down if calculating the
 weight on the lhs postlist is more expensive than calling check() on the
 rhs postlist.

 See http://qwikfix.co.uk/asus-contact-number/

--

--
Ticket URL: <https://trac.xapian.org/ticket/378#comment:11>
Xapian <https://xapian.org/>
Xapian



More information about the Xapian-tickets mailing list