MultiDatabase shard count limitations

Sun Aug 8 20:40:04 BST 2021

Olly Betts <olly at survex.com> wrote:
> On Wed, Aug 26, 2020 at 12:56:53AM +0100, Olly Betts wrote:
> > On Tue, Aug 25, 2020 at 10:15:42PM +0000, Eric Wong wrote:
> > > So I managed to get current xapian.git (commit 61724d477edb)
> > > built with CXXFLAGS=-ggdb3, and it's closer to 100%:
> > >
> > > https://80x24.org/spew/20200825215517.GA3936@dcvr/2-perf-report-20200825-214820.gz
> > >
> > > The machine I'm working on is also significantly busier at the
> > > moment trying to reproduce an unrelated problem.
> >
> > Oh and perf samples the whole machine, not just this process.
> >
> > This seems to suggest a significant part of the problem is getting the
> > wdf upper bound for each term (which is used to bound the weight each
> > term can return).  This seems to account for 36.43% of a process total
> > of 86.62% - I think this really shouldn't take any significant time,
> > which would probably mean a 40%+ speed up for this case by itself.
> >
> > For glass, we store a global wdf upper bound for the database, and then
> > return the smaller of this and the term's collection frequency - the
> > implementation of this looks up the collection frequency when called,
> > which is stored in the first chunk of the postlist for that term.  That
> > means during a search we end up re-fetching this separately to finding
> > it to read the postlist, with an extra cursor seek.
> >
> > It would be better to just look up each term once.  I think to achieve
> > that cleanly will need a bit of refactoring.
>
> Here's a patch against xapian git master to do that:
>
> https://oligarchy.co.uk/xapian/patches/get-wdf-upper-bound-from-postlist.patch
>
> I think it'll need a bit more work for 1.4.x, but if you're able to
> test this that'd be useful.

Thanks, I tested against master
(74df7696f8603add68c6fda27e44dda2b7090093) with SWIG Xapian.pm.

With 537 shards the time to search a particular query went from
~2 minutes to ~3s; so this patch is a huge improvement for me.

master roughly matches Debian buster 1.4.11 and buster-backport
1.4.18 speeds, it's busy system, so it's hard to notice small
improvements, but major improvements are easily noticeable.

> My own tests don't show a significant improvement, but they're small
> scale and CPU-bound; I would expect the big savings to come if we're I/O
> bound as it could then avoid rereading data from disk.  Your profile
> seemed to show that's where you were.

All my Xapian DBs are on SSD, but SATA-2 so not the fastest
storage.  Overall space for Xapian DBs is 278G; so way more
than I can fit into the kernel page cache.

Btw, I noticed some "make check" failures so far:

  Running tests with backend "multi_glass"...
  Running test: eliteset1... FAILED
  Running test: eliteset2... FAILED
  Running test: eliteset4... FAILED
  ...
	Running tests with backend "multi_glass_remoteprog_glass"...
	Running test: eliteset1... FAILED
	Running test: eliteset2... FAILED

I think it's from the patch (it takes a while to run "make check").
My search results for me are as expected; but I only tested one
particular query and I'm not using OP_ELITE_SET (haven't really
digested many of the options :x)

Thanks again.