[Xapian-tickets] [Xapian] #264: Optimise expand using min weight techniques
Xapian
nobody at xapian.org
Thu Dec 7 00:50:30 GMT 2023
#264: Optimise expand using min weight techniques
-------------------------+-------------------------------
Reporter: Olly Betts | Owner: Olly Betts
Type: enhancement | Status: assigned
Priority: normal | Milestone: 2.0.0
Component: Matcher | Version: git master
Severity: minor | Resolution:
Keywords: | Blocked By:
Blocking: | Operating System: All
-------------------------+-------------------------------
Comment (by Olly Betts):
I've been mulling this over more - at this point the current optimisation
is essentially for the benefit of remote shards, where we open a
RemoteTermList and on the remote that fetches the termfreq for every term
from the remote shard and sends it over. That does result in fetching and
sending a fair amount of unused and redundant data, but is likely to be a
win over fetching just the termfreq info we want which would require one
remote protocol message exchange per term being considered.
For local shards, it's probably unhelpful - when there are multiple
documents marked as relevant in the same shard we'll end up fetching the
termfreq multiple times for terms which occur in multiple of those
relevant documents. There's the upside of saving fetching termfreq for
shards which don't have a relevant document containing that term (assuming
we aren't asked to use exact termfreq), but that comes at the cost of
using approximated termfreqs.
I wonder if we ought to push a merging step onto the remote so that if
there are multiple relevant documents on a remote we merge termlists for
those documents on the remote and send one combined termlist back with
termfreqs (which we can then only fetch once).
We could have a similar merging step for just the local shards, and if
there are both local and remote shards with relevant documents, a final
merge step between the two.
This seems like it would be as good or better in every case, and probably
doesn't need a huge amount of new code.
--
Ticket URL: <https://trac.xapian.org/ticket/264#comment:9>
Xapian <https://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list