MultiDatabase shard count limitations

Olly Betts olly at
Wed Aug 26 00:56:53 BST 2020

On Tue, Aug 25, 2020 at 10:15:42PM +0000, Eric Wong wrote:
> So I managed to get current xapian.git (commit 61724d477edb)
> built with CXXFLAGS=-ggdb3, and it's closer to 100%:
> The machine I'm working on is also significantly busier at the
> moment trying to reproduce an unrelated problem.

Oh and perf samples the whole machine, not just this process.

This seems to suggest a significant part of the problem is getting the
wdf upper bound for each term (which is used to bound the weight each
term can return).  This seems to account for 36.43% of a process total
of 86.62% - I think this really shouldn't take any significant time,
which would probably mean a 40%+ speed up for this case by itself.

For glass, we store a global wdf upper bound for the database, and then
return the smaller of this and the term's collection frequency - the
implementation of this looks up the collection frequency when called,
which is stored in the first chunk of the postlist for that term.  That
means during a search we end up re-fetching this separately to finding
it to read the postlist, with an extra cursor seek.

It would be better to just look up each term once.  I think to achieve
that cleanly will need a bit of refactoring.

> Unfortunately, google-pprof can't seem to figure out symbols
> like perf can.  I had the same problem with the Debian-provided
> -dbgsym packages for 1.4.11-1 as I am having with xapian.git
> Maybe I should've tried -g instead of -ggdb3?
> (I've been mainly using -ggdb3 in C code for years,
>  and didn't recall -g itself wasn't too useful).

FWIW, it seems to work OK in debian unstable with both the packaged
xapian-core 1.4.x (which I think is built with -g) and git master built
with -g.

But we've identified something to address so it's probably most sensible
to address that and then try to reprofile.


More information about the Xapian-discuss mailing list