[Xapian-tickets] [Xapian] #378: Optimise MultiAndPostList using current weight of lhs.
Xapian
nobody at xapian.org
Mon May 25 00:59:13 BST 2009
#378: Optimise MultiAndPostList using current weight of lhs.
-------------------------+--------------------------------------------------
Reporter: richard | Owner: olly
Type: enhancement | Status: new
Priority: normal | Milestone: 1.1.2
Component: Matcher | Version: SVN trunk
Severity: normal | Blockedby:
Platform: All | Blocking:
-------------------------+--------------------------------------------------
Currently, MultiAndPostList::next() advances the sub-postlists until they
all reach a common document ID, by moving the first postlist forward, and
then calling check on all the subsequent postlists, with the document ID
that first postlist has moved to. The weights are not calculated until
after the postlists have all settled on a position (so the w_min value is
not used by MultiAndPostList - though it may be used by the sub-
postlists).
Instead, each time the lhs moves to a new position, the current weight of
that sub-postlist could be calculated. If this weight is insufficient,
when combined with the maxweights of the remaining sub-postlists, to
attain w_min(), the lhs could be moved further forward, without calling
check() on any of the subsequent posting lists; thus reducing the
frequency at which the subsequent posting lists need to be accessed. If
the subsequent posting lists are "expensive" to check for some reason
(perhaps they're external posting sources, or value ranges), this can
speed things up.
This principle could be used in AndMaybePostList too, and possibly to
OrPostList.
Speculatively, this idea could be extended, such that the skip_to() and
check() methods of posting sources would be given an extra
"current_min_weight" parameter, indicating the minimum weight required for
the specified document id. This would allow the optimisation to be
applied outside within the individual sub-postlist trees of a
MultiAndPostList.
I've got a prototype patch to MultiAndPostList, and some timings for a
particular query. In most cases, I don't see a noticeable performance
difference with the prototype patch, but in some cases, it results in a
significant speed up. On the other hand, (though I've not observed it),
it's plausible that this patch could slow things down if calculating the
weight on the lhs postlist is more expensive than calling check() on the
rhs postlist.
--
Ticket URL: <http://trac.xapian.org/ticket/378>
Xapian <http://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list