[Xapian-discuss] Update under a large database is slow

Olly Betts olly at survex.com
Wed Sep 12 13:08:51 BST 2007


On Fri, Sep 07, 2007 at 05:23:28PM +0800, Gea-Suan Lin wrote:
> CPU: Intel(R) Xeon(R) CPU            5140  @ 2.33GHz (2327.51-MHz K8-class CPU)
> usable memory = 4285677568 (4087 MB)
> avail memory  = 4124434432 (3933 MB)
> 
> Running FreeBSD 7.0-CURRENT, using Gigabit Ethernet to access NetApp
> with NFSv3.

These NAS systems usually work well, but there can be some gotchas, as
James has already highlighted (I think I remember when using one years
ago it was faster mounted as NFSv2 than NFSv3, so it's amusing that now
it can work better to use NFSv4 than NFSv3).

It would be interesting to compare building part of the database on
local disk compared to NAS to see if the issues are Xapian or the NAS.

> > Are you setting XAPIAN_FLUSH_THRESHOLD?
> 
> No, use default value.

You might get better indexing performance by increasing it.  It's the
number of documents to collate postlist information for before writing
it to disk.  Ideally this should self-tune to a large extent, but
currently it doesn't.  It's hard to say what a good figure is for a
particular setup, especially in your case as you have a lot of terms
in each document.

> We keep the original "small" part database (about 10k per db), update
> them and run xapian-compact to merge every 12 hours.

Yes, this approach works pretty well.

> btw, it looks like xapian-compact -m (multipass) is slow then without -m
> when merge from ~800 small databases.

Interesting.  It's certainly faster for me when starting from databases
with a million documents in.  It seems plausible that when starting from
smaller databases it might actually be faster to merge more at once
because the working set will be smaller (the multipass option merges 2
at a time, or 3 at the end if there's a odd number, if I remember
correctly).

Cheers,
    Olly



More information about the Xapian-discuss mailing list