MultiDatabase shard count limitations

Eric Wong e at 80x24.org
Fri Aug 21 10:06:59 BST 2020


Going back to the "prioritizing aggregated DBs" thread from
February 2020, I've got 390 Xapian shards for 130 public inboxes
I want to search against(*).  There's more on the horizon (we're
expecting tens of thousands of public inboxes).

After bumping RLIMIT_NOFILE and running ->add_database a bunch,
the actual queries seem to be taking ~30s (not good :x).

Now I'm thinking, MultiDatabase isn't the right way to go about
this...

Perhaps creating a new, all-encompassing Xapian index with a
reasonable shard count would be wise, at least for the normal
WWW frontend?

Managing removals of entire inboxes from an all-encompassing
Xapian DB would get much trickier.

IMAP search would still require per-mailbox indices, I think;
because UIDs are currently tied to NNTP article numbers.
Some attributes such as INTERNALDATE (Received: time) and
exact byte sizes would differ if the same message is
cross-posted to multiple public mailing lists, too.



More information about the Xapian-discuss mailing list