[Xapian-tickets] [Xapian] #378: Optimise MultiAndPostList using current weight of lhs.

Xapian nobody at xapian.org
Mon May 25 00:59:13 BST 2009


#378: Optimise MultiAndPostList using current weight of lhs.
-------------------------+--------------------------------------------------
 Reporter:  richard      |       Owner:  olly     
     Type:  enhancement  |      Status:  new      
 Priority:  normal       |   Milestone:  1.1.2    
Component:  Matcher      |     Version:  SVN trunk
 Severity:  normal       |   Blockedby:           
 Platform:  All          |    Blocking:           
-------------------------+--------------------------------------------------
 Currently, MultiAndPostList::next() advances the sub-postlists until they
 all reach a common document ID, by moving the first postlist forward, and
 then calling check on all the subsequent postlists, with the document ID
 that first postlist has moved to.  The weights are not calculated until
 after the postlists have all settled on a position (so the w_min value is
 not used by MultiAndPostList - though it may be used by the sub-
 postlists).

 Instead, each time the lhs moves to a new position, the current weight of
 that sub-postlist could be calculated. If this weight is insufficient,
 when combined with the maxweights of the remaining sub-postlists, to
 attain w_min(), the lhs could be moved further forward, without calling
 check() on any of the subsequent posting lists; thus reducing the
 frequency at which the subsequent posting lists need to be accessed.  If
 the subsequent posting lists are "expensive" to check for some reason
 (perhaps they're external posting sources, or value ranges), this can
 speed things up.

 This principle could be used in AndMaybePostList too, and possibly to
 OrPostList.

 Speculatively, this idea could be extended, such that the skip_to() and
 check() methods of posting sources would be given an extra
 "current_min_weight" parameter, indicating the minimum weight required for
 the specified document id.  This would allow the optimisation to be
 applied outside within the individual sub-postlist trees of a
 MultiAndPostList.

 I've got a prototype patch to MultiAndPostList, and some timings for a
 particular query.  In most cases, I don't see a noticeable performance
 difference with the prototype patch, but in some cases, it results in a
 significant speed up.  On the other hand, (though I've not observed it),
 it's plausible that this patch could slow things down if calculating the
 weight on the lhs postlist is more expensive than calling check() on the
 rhs postlist.

-- 
Ticket URL: <http://trac.xapian.org/ticket/378>
Xapian <http://xapian.org/>
Xapian



More information about the Xapian-tickets mailing list