prioritizing aggregated DBs

Olly Betts olly at survex.com
Fri Feb 7 22:08:01 GMT 2020


On Fri, Feb 07, 2020 at 09:33:08PM +0000, Eric Wong wrote:
> Hey all, I've been using ->add_database for a few years
> to tie sharded DBs together and it works great.
> 
> Now, I want to be able to search across several DBs
> which aren't sharded, say: linux-DB, glibc-DB, freebsd-DB.
> 
> I want to search for something across all of them, but
> prioritize results to favor one or some of those DBs over
> others.  Is there a way to do that without reindexing?

With git master you can achieve this with a PostingSource subclass as
there's a new PostingSource::reset() method which gets passed the
shard it is being called for, so you can set an extra weight
contribution based on that.  This is a replacement for
PostingSource::init() in 1.4, which doesn't know which shard it is being
called for.

You can then combine this PostingSource with your query with AND_MAYBE
(so it matches exactly what the query does, but takes an extra weight
contribution from the PostingSource for matching documents).

> Or would I fiddle with wdf_inc for all ->index_text and ->add_term
> calls on a per-DB basis?

That would probably work if you don't want to be able to vary the
prioritisation dynamically.

Cheers,
    Olly



More information about the Xapian-discuss mailing list