[Xapian-tickets] [Xapian] #378: Optimise MultiAndPostList using current weight of lhs
Xapian
nobody at xapian.org
Tue Apr 10 22:48:16 BST 2018
#378: Optimise MultiAndPostList using current weight of lhs
-------------------------+------------------------------
Reporter: richard | Owner: richard
Type: enhancement | Status: assigned
Priority: low | Milestone: 1.4.x
Component: Matcher | Version: SVN trunk
Severity: normal | Resolution:
Keywords: | Blocked By:
Blocking: | Operating System: All
-------------------------+------------------------------
Description changed by olly:
Old description:
> Currently, MultiAndPostList::next() advances the sub-postlists until they
> all reach a common document ID, by moving the first postlist forward, and
> then calling check on all the subsequent postlists, with the document ID
> that first postlist has moved to. The weights are not calculated until
> after the postlists have all settled on a position (so the w_min value is
> not used by MultiAndPostList - though it may be used by the sub-
> postlists).
>
> Instead, each time the lhs moves to a new position, the current weight of
> that sub-postlist could be calculated. If this weight is insufficient,
> when combined with the maxweights of the remaining sub-postlists, to
> attain w_min(), the lhs could be moved further forward, without calling
> check() on any of the subsequent posting lists; thus reducing the
> frequency at which the subsequent posting lists need to be accessed. If
> the subsequent posting lists are "expensive" to check for some reason
> (perhaps they're external posting sources, or value ranges), this can
> speed things up.
>
> This principle could be used in AndMaybePostList too, and possibly to
> OrPostList.
>
> Speculatively, this idea could be extended, such that the skip_to() and
> check() methods of posting sources would be given an extra
> "current_min_weight" parameter, indicating the minimum weight required
> for the specified document id. This would allow the optimisation to be
> applied outside within the individual sub-postlist trees of a
> MultiAndPostList.
>
> I've got a prototype patch to MultiAndPostList, and some timings for a
> particular query. In most cases, I don't see a noticeable performance
> difference with the prototype patch, but in some cases, it results in a
> significant speed up. On the other hand, (though I've not observed it),
> it's plausible that this patch could slow things down if calculating the
> weight on the lhs postlist is more expensive than calling check() on the
> rhs postlist.
>
> See http://qwikfix.co.uk/asus-contact-number/
New description:
Currently, MultiAndPostList::next() advances the sub-postlists until they
all reach a common document ID, by moving the first postlist forward, and
then calling check on all the subsequent postlists, with the document ID
that first postlist has moved to. The weights are not calculated until
after the postlists have all settled on a position (so the w_min value is
not used by MultiAndPostList - though it may be used by the sub-
postlists).
Instead, each time the lhs moves to a new position, the current weight of
that sub-postlist could be calculated. If this weight is insufficient,
when combined with the maxweights of the remaining sub-postlists, to
attain w_min(), the lhs could be moved further forward, without calling
check() on any of the subsequent posting lists; thus reducing the
frequency at which the subsequent posting lists need to be accessed. If
the subsequent posting lists are "expensive" to check for some reason
(perhaps they're external posting sources, or value ranges), this can
speed things up.
This principle could be used in AndMaybePostList too, and possibly to
OrPostList.
Speculatively, this idea could be extended, such that the skip_to() and
check() methods of posting sources would be given an extra
"current_min_weight" parameter, indicating the minimum weight required for
the specified document id. This would allow the optimisation to be
applied outside within the individual sub-postlist trees of a
MultiAndPostList.
I've got a prototype patch to MultiAndPostList, and some timings for a
particular query. In most cases, I don't see a noticeable performance
difference with the prototype patch, but in some cases, it results in a
significant speed up. On the other hand, (though I've not observed it),
it's plausible that this patch could slow things down if calculating the
weight on the lhs postlist is more expensive than calling check() on the
rhs postlist.
--
--
Ticket URL: <https://trac.xapian.org/ticket/378#comment:11>
Xapian <https://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list