[Xapian-discuss] Update under a large database is slow

Gea-Suan Lin gslin at gslin.org
Fri Sep 7 10:23:28 BST 2007


On Wed, Sep 05, 2007 at 03:08:32AM +0100, Olly Betts wrote:
> On Thu, Jul 12, 2007 at 10:18:07AM +0800, Gea-Suan Lin wrote:
> > We use Perl module Search::Xapian 1.0.2.0 to index ~4m articles (it's
> > 26GB right now), but updating is slow. (about 4 article/sec with I/O
> > bound)
> 
> What spec is the machine?

CPU: Intel(R) Xeon(R) CPU            5140  @ 2.33GHz (2327.51-MHz K8-class CPU)
usable memory = 4285677568 (4087 MB)
avail memory  = 4124434432 (3933 MB)

Running FreeBSD 7.0-CURRENT, using Gigabit Ethernet to access NetApp
with NFSv3.

> Are you setting XAPIAN_FLUSH_THRESHOLD?

No, use default value.

> > The articles are UTF-8 CJK, we use bigram to generate terms, so it's
> > very easy to generate ~10k terms for a mid-size article. The article
> > itself is not stored in Xapian, but only the terms.
> 
> That is a lot more terms than is typical, so I'd expect indexing to be
> slower, but 4 per second is very slow.

Right now we use another approach to save time.

We keep the original "small" part database (about 10k per db), update
them and run xapian-compact to merge every 12 hours.

btw, it looks like xapian-compact -m (multipass) is slow then without -m
when merge from ~800 small databases.

-- 
* Gea-Suan Lin  (public key: Using https://keyserver.pgp.com/ to search)
* If you cannot convince them, confuse them.           -- Harry S Truman



More information about the Xapian-discuss mailing list