prioritizing aggregated DBs
Eric Wong
e at 80x24.org
Sat Feb 8 18:04:42 GMT 2020
Olly Betts <olly at survex.com> wrote:
> On Fri, Feb 07, 2020 at 09:33:08PM +0000, Eric Wong wrote:
> > Hey all, I've been using ->add_database for a few years
> > to tie sharded DBs together and it works great.
> >
> > Now, I want to be able to search across several DBs
> > which aren't sharded, say: linux-DB, glibc-DB, freebsd-DB.
> >
> > I want to search for something across all of them, but
> > prioritize results to favor one or some of those DBs over
> > others. Is there a way to do that without reindexing?
>
> With git master you can achieve this with a PostingSource subclass as
> there's a new PostingSource::reset() method which gets passed the
> shard it is being called for, so you can set an extra weight
> contribution based on that. This is a replacement for
> PostingSource::init() in 1.4, which doesn't know which shard it is being
> called for.
>
> You can then combine this PostingSource with your query with AND_MAYBE
> (so it matches exactly what the query does, but takes an extra weight
> contribution from the PostingSource for matching documents).
Cool. I'll keep that in mind down the line. That could be a
while since some users are still on 1.2 and tend to stick to
what's provided by enterprise/LTS distros.
> > Or would I fiddle with wdf_inc for all ->index_text and ->add_term
> > calls on a per-DB basis?
>
> That would probably work if you don't want to be able to vary the
> prioritisation dynamically.
That's a compromise I'll have to make, for now. Thanks for the
response!
More information about the Xapian-discuss
mailing list