System requirements to boost xapian's performance?

Olly Betts olly at survex.com
Mon Jan 17 01:07:19 GMT 2022


On Sat, Jan 15, 2022 at 11:52:54AM -0400, David Bremner wrote:
> Philip Colmer <philip.colmer at linaro.org> writes:
> >
> > What changes can I make to the specification of a server to best improve
> > the performance of the indexing? For example, if I throw more cores at
> > this, will the indexing go faster?
> 
> I think it will depend a great deal on the indexer, and I don't know
> anything about how mailman uses Xapian. Based on my experience with
> Notmuch I would say strive for fast IO (definitely SSD, perhaps ramdisk)
> and fast single threaded performance. Memory use is usually moderate by
> 2021 standards.

I should also include the caveat that I also know nothing specific about
how mailman uses Xapian.

If you're contemplating using a RAM disk, I'd expect (though haven't
benchmarked) that you'd get equivalent gains by letting that RAM instead
be used by the OS to cache all of a disk-based database and disable
syncing of the database for the initial run.  That has the added
benefits that you don't need to create a RAM disk (and so don't need
to decide how big to make it), and don't need to copy the database from
RAM disk to disk once indexing completes.

You can disable syncing by opening the Xapian::WritableDatabase with
the DB_NO_SYNC flag (which may need a tweak to the indexer code if
it doesn't already support doing so), or running the indexing command
under eatmydata.

You can also speed up an initial index run by using DB_DANGEROUS which
updates database blocks in place rather than doing copy-on-write which
reduces the amount of I/O required.  This is even less crash resilient.

Increasing the batch size between automatic flushes can improve
throughput if there's plenty of RAM - set (and export!)
environment variable XAPIAN_FLUSH_THRESHOLD which is a threshold for a
counter of number of documents changed.  It defaults to 10000, which
is fairly conservative.

Using the newest Xapian version you can may also help.  E.g. 1.4.19
added an optimisation which helps indexing if the indexer runs queries
during indexing.

Cheers,
    Olly



More information about the Xapian-discuss mailing list