prioritizing aggregated DBs

Eric Wong e at 80x24.org
Sat Feb 8 18:04:42 GMT 2020


Olly Betts <olly at survex.com> wrote:
> On Fri, Feb 07, 2020 at 09:33:08PM +0000, Eric Wong wrote:
> > Hey all, I've been using ->add_database for a few years
> > to tie sharded DBs together and it works great.
> > 
> > Now, I want to be able to search across several DBs
> > which aren't sharded, say: linux-DB, glibc-DB, freebsd-DB.
> > 
> > I want to search for something across all of them, but
> > prioritize results to favor one or some of those DBs over
> > others.  Is there a way to do that without reindexing?
> 
> With git master you can achieve this with a PostingSource subclass as
> there's a new PostingSource::reset() method which gets passed the
> shard it is being called for, so you can set an extra weight
> contribution based on that.  This is a replacement for
> PostingSource::init() in 1.4, which doesn't know which shard it is being
> called for.
> 
> You can then combine this PostingSource with your query with AND_MAYBE
> (so it matches exactly what the query does, but takes an extra weight
> contribution from the PostingSource for matching documents).

Cool.  I'll keep that in mind down the line.  That could be a
while since some users are still on 1.2 and tend to stick to
what's provided by enterprise/LTS distros.

> > Or would I fiddle with wdf_inc for all ->index_text and ->add_term
> > calls on a per-DB basis?
> 
> That would probably work if you don't want to be able to vary the
> prioritisation dynamically.

That's a compromise I'll have to make, for now.  Thanks for the
response!



More information about the Xapian-discuss mailing list