compact checkpoints while doing days-long indexing jobs?

Eric Wong e at 80x24.org
Thu Aug 21 02:05:10 BST 2025


Hello, I'm trying to get Xapian to work better on btrfs which is
prone to fragmentation regardless on whether or not btrfs CoW
is enabled.

Thus, I'm wondering if running xapian-compact occasionally
during a multi-day indexing can improve indexing performance.

I'll be using the BTRFS_IOC_DEFRAG ioctl to periodically defrag
glass files after some (probably not all) transaction commits.

I've noticed that even on small, fresh imports (with few/minimal
deletes) compact can reduce file sizes by 20-60%, so I'm
wondering if compact before btrfs defrag is helpful even if I
intend to add more docs right after the compact+defrag.

My code manually commits every so often (adjustable) to reduce
dirty memory since the default XAPIAN_FLUSH_THRESHOLD=10000 is
too high and I want to to keep auxilliary SQLite file(s) synced
w/ Xapian, as well.

I'm dealing with over 20 million docs across 3 (adjustable)
shards in parallel (Perl is probably a bottleneck, too :x).
Document numbers are assigned to shards based on
$NNTP_ARTICLE_NUMBER % $SHARD_COUNT so I rely on --no-renumber.

It takes 4-5 days with btrfs CoW disabled (haven't tried CoW,
yet), so it's a PITA to keep a machine quiet for that long since
I have other stuff to do.  Testing with a smaller (faster) data
set doesn't reveal much since fragmentation is mainly noticeable
with giant ones.

Thanks.



More information about the Xapian-discuss mailing list