[Xapian-tickets] [Xapian] #264: Optimise expand using min weight techniques
Xapian
nobody at xapian.org
Tue Dec 5 04:24:42 GMT 2023
#264: Optimise expand using min weight techniques
-------------------------+-------------------------------
Reporter: Olly Betts | Owner: Olly Betts
Type: enhancement | Status: assigned
Priority: normal | Milestone: 2.0.0
Component: Matcher | Version: git master
Severity: minor | Resolution:
Keywords: | Blocked By:
Blocking: | Operating System: All
-------------------------+-------------------------------
Changes (by Olly Betts):
* milestone: 1.5.0 => 2.0.0
Comment:
> Looking at the expansion code, it looks to me like we fetch the
collection frequency for every term, which is a waste of effort when we're
using the default `TradEWeight` weighting - it's only used by
`Bo1EWeight`.
Addressed by 8dc6f72354d733db17fa564bfb5db51090a8adc3 which I'll backport
for 1.4.25.
> We should have a `need_stat()` mechanism like `Xapian::Weight` has.
This is still true, but I think the whole mechanism should be reviewed as
it looks like it was built back when we stored the termfreq in every
posting list entry, which isn't really feasible for an updatable database
format. Maybe we can be lazier about this if there are cases where we can
rule a term out without knowing its termfreq (and for Bo1 we don't
actually use the termfreq anyway). Also it looks like we probably
effectively double fetch the termfreqs if `use_exact_termfreq` is
specified (except when we detect that the summed termfreq from the tree is
exact already).
Postponing further work for now, as this is an optimisation rather than
correctness and I've made a significant improvement to the default case.
--
Ticket URL: <https://trac.xapian.org/ticket/264#comment:8>
Xapian <https://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list