[Xapian-tickets] [Xapian] #264: Optimise expand using min weight techniques

Xapian nobody at xapian.org
Tue Dec 5 04:24:42 GMT 2023


#264: Optimise expand using min weight techniques
-------------------------+-------------------------------
 Reporter:  Olly Betts   |             Owner:  Olly Betts
     Type:  enhancement  |            Status:  assigned
 Priority:  normal       |         Milestone:  2.0.0
Component:  Matcher      |           Version:  git master
 Severity:  minor        |        Resolution:
 Keywords:               |        Blocked By:
 Blocking:               |  Operating System:  All
-------------------------+-------------------------------
Changes (by Olly Betts):

 * milestone:  1.5.0 => 2.0.0

Comment:

 > Looking at the expansion code, it looks to me like we fetch the
 collection frequency for every term, which is a waste of effort when we're
 using the default `TradEWeight` weighting - it's only used by
 `Bo1EWeight`.

 Addressed by 8dc6f72354d733db17fa564bfb5db51090a8adc3 which I'll backport
 for 1.4.25.

 > We should have a `need_stat()` mechanism like `Xapian::Weight` has.

 This is still true, but I think the whole mechanism should be reviewed as
 it looks like it was built back when we stored the termfreq in every
 posting list entry, which isn't really feasible for an updatable database
 format.  Maybe we can be lazier about this if there are cases where we can
 rule a term out without knowing its termfreq (and for Bo1 we don't
 actually use the termfreq anyway).  Also it looks like we probably
 effectively double fetch the termfreqs if `use_exact_termfreq` is
 specified (except when we detect that the summed termfreq from the tree is
 exact already).

 Postponing further work for now, as this is an optimisation rather than
 correctness and I've made a significant improvement to the default case.
-- 
Ticket URL: <https://trac.xapian.org/ticket/264#comment:8>
Xapian <https://xapian.org/>
Xapian


More information about the Xapian-tickets mailing list