[Xapian-tickets] [Xapian] #784: Have a public API for merging MSets

Sun Jun 5 07:13:53 BST 2022

#784: Have a public API for merging MSets
-----------------------------+-------------------------------
 Reporter:  German M. Bravo  |             Owner:  Olly Betts
     Type:  enhancement      |            Status:  new
 Priority:  normal           |         Milestone:
Component:  Library API      |           Version:
 Severity:  normal           |        Resolution:
 Keywords:                   |        Blocked By:
 Blocking:                   |  Operating System:  All
-----------------------------+-------------------------------
Comment (by Olly Betts):

 >> template<ITOR> Xapian::MSet merge(ITOR begin, ITOR end)
 > To clarify, you mean that to be a static method of MSet? of what type
 would that ITOR be? and where/how would it get the stats?

 Looks like I failed to answer this part.

 Probably as a static method of `MSet`, yes.

 ITOR would be any iterator type returning a `Xapian::MSet` (or reference),
 so you could do stuff like:

 ```
 auto merged_= Xapian::MSet::merge(mset_container.begin(),
 mset_container.end();
 ```

 > The stats method would be to get the stats for a query. I thought
 merging was a two-seteps process:
 >
 >  1. Get stats for a query from each involved database and merge as
 "total" stats; and
 >  2. Get query results (queries using the total stats) and merge results
 into the merged MSet

 That's approximately how the match works internally, that's a public API
 to allow reimplementing something equivalent to Xapian's matcher outside
 of Xapian rather than just for merging MSets.

 > I think parallel query matching is better handled automatically inside
 the matcher rather than by adding new API features which users then have
 to connect up for themselves.

 I'm guessing your actual motivation here is for use by xapiand, right?

 What does xapiand actually need?  Instead of just trying to expose a lot
 of internal details of Xapian can we make some smaller tweaks which would
 allow xapiand to effectively do what it needs to via the existing API?

 For example, parallel matching could clearly be done within the existing
 matcher.  There's a risk it might be slower if you're talking about
 matching multiple local shards in parallel since I/O is the limitation as
 it would likely result in more scattered I/O access pattern.  Also
 currently each shard processed locally can benefit from a minimum weight
 established by the shard(s) before, so there's more total work to do if
 shards are processed in parallel.  It would likely useful for cases where
 time for a single search matters more than total throughput and the
 database is mostly cached.  And by doing in it Xapian other users of
 Xapian can benefit.
-- 
Ticket URL: <https://trac.xapian.org/ticket/784#comment:4>
Xapian <https://xapian.org/>
Xapian