[Xapian-tickets] [Xapian] #264: Optimise expand using min weight techniques

Xapian nobody at xapian.org
Thu Dec 7 00:50:30 GMT 2023


#264: Optimise expand using min weight techniques
-------------------------+-------------------------------
 Reporter:  Olly Betts   |             Owner:  Olly Betts
     Type:  enhancement  |            Status:  assigned
 Priority:  normal       |         Milestone:  2.0.0
Component:  Matcher      |           Version:  git master
 Severity:  minor        |        Resolution:
 Keywords:               |        Blocked By:
 Blocking:               |  Operating System:  All
-------------------------+-------------------------------
Comment (by Olly Betts):

 I've been mulling this over more - at this point the current optimisation
 is essentially for the benefit of remote shards, where we open a
 RemoteTermList and on the remote that fetches the termfreq for every term
 from the remote shard and sends it over.  That does result in fetching and
 sending a fair amount of unused and redundant data, but is likely to be a
 win over fetching just the termfreq info we want which would require one
 remote protocol message exchange per term being considered.

 For local shards, it's probably unhelpful - when there are multiple
 documents marked as relevant in the same shard we'll end up fetching the
 termfreq multiple times for terms which occur in multiple of those
 relevant documents.  There's the upside of saving fetching termfreq for
 shards which don't have a relevant document containing that term (assuming
 we aren't asked to use exact termfreq), but that comes at the cost of
 using approximated termfreqs.

 I wonder if we ought to push a merging step onto the remote so that if
 there are multiple relevant documents on a remote we merge termlists for
 those documents on the remote and send one combined termlist back with
 termfreqs (which we can then only fetch once).

 We could have a similar merging step for just the local shards, and if
 there are both local and remote shards with relevant documents, a final
 merge step between the two.

 This seems like it would be as good or better in every case, and probably
 doesn't need a huge amount of new code.
-- 
Ticket URL: <https://trac.xapian.org/ticket/264#comment:9>
Xapian <https://xapian.org/>
Xapian


More information about the Xapian-tickets mailing list