[Xapian-discuss] My new record: Indexing 20 millions docs = 79m9.378s

Olly Betts olly at survex.com
Fri Feb 9 06:07:11 GMT 2007


On Wed, Feb 07, 2007 at 01:21:06PM -0800, Kevin Duraj wrote:
> Gentoo Linux 2.6
> 8 AMD Opteron 64-bit Processors
> 32GB Memory
> --------------------------------------------------------------------------------
> 
> Environment:
> ------------------
> XAPIAN_FLUSH_THRESHOLD=21000000
> XAPIAN_FLUSH_THRESHOLD_LENGTH=16000000

Setting XAPIAN_FLUSH_THRESHOLD_LENGTH no longer does anything (it was
removed in September 2004).

> PS: In my scenario after 25 million records the indexing significantly
> slows down (2x-4x) I do not know why? Could it be because of the
> B-Tree become very complex?

That seems unlikely, the B-Tree complexity grows logarithmically.

It's probably a cache effect - as the working set of a process grows,
performance can suddenly get worse when it just fails to fit in the
available CPU cache.  In your case, I suspect it's some key subset of
the working set which is the issue.

When indexing, do you only call WritableDatabase::add_document()?  If
so, we should be able to index significantly faster than this by
buffering appended changes in a more compact way.

Cheers,
    Olly



More information about the Xapian-discuss mailing list