[Xapian-tickets] [Xapian] #784: Have a public API for merging MSets
Xapian
nobody at xapian.org
Sun Jun 5 07:13:53 BST 2022
#784: Have a public API for merging MSets
-----------------------------+-------------------------------
Reporter: German M. Bravo | Owner: Olly Betts
Type: enhancement | Status: new
Priority: normal | Milestone:
Component: Library API | Version:
Severity: normal | Resolution:
Keywords: | Blocked By:
Blocking: | Operating System: All
-----------------------------+-------------------------------
Comment (by Olly Betts):
>> template<ITOR> Xapian::MSet merge(ITOR begin, ITOR end)
> To clarify, you mean that to be a static method of MSet? of what type
would that ITOR be? and where/how would it get the stats?
Looks like I failed to answer this part.
Probably as a static method of `MSet`, yes.
ITOR would be any iterator type returning a `Xapian::MSet` (or reference),
so you could do stuff like:
```
auto merged_= Xapian::MSet::merge(mset_container.begin(),
mset_container.end();
```
> The stats method would be to get the stats for a query. I thought
merging was a two-seteps process:
>
> 1. Get stats for a query from each involved database and merge as
"total" stats; and
> 2. Get query results (queries using the total stats) and merge results
into the merged MSet
That's approximately how the match works internally, that's a public API
to allow reimplementing something equivalent to Xapian's matcher outside
of Xapian rather than just for merging MSets.
> I think parallel query matching is better handled automatically inside
the matcher rather than by adding new API features which users then have
to connect up for themselves.
I'm guessing your actual motivation here is for use by xapiand, right?
What does xapiand actually need? Instead of just trying to expose a lot
of internal details of Xapian can we make some smaller tweaks which would
allow xapiand to effectively do what it needs to via the existing API?
For example, parallel matching could clearly be done within the existing
matcher. There's a risk it might be slower if you're talking about
matching multiple local shards in parallel since I/O is the limitation as
it would likely result in more scattered I/O access pattern. Also
currently each shard processed locally can benefit from a minimum weight
established by the shard(s) before, so there's more total work to do if
shards are processed in parallel. It would likely useful for cases where
time for a single search matters more than total throughput and the
database is mostly cached. And by doing in it Xapian other users of
Xapian can benefit.
--
Ticket URL: <https://trac.xapian.org/ticket/784#comment:4>
Xapian <https://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list