[Xapian-discuss] My new record: Indexing 20 millions docs =
79m9.378s
Olly Betts
olly at survex.com
Fri Feb 9 06:07:11 GMT 2007
On Wed, Feb 07, 2007 at 01:21:06PM -0800, Kevin Duraj wrote:
> Gentoo Linux 2.6
> 8 AMD Opteron 64-bit Processors
> 32GB Memory
> --------------------------------------------------------------------------------
>
> Environment:
> ------------------
> XAPIAN_FLUSH_THRESHOLD=21000000
> XAPIAN_FLUSH_THRESHOLD_LENGTH=16000000
Setting XAPIAN_FLUSH_THRESHOLD_LENGTH no longer does anything (it was
removed in September 2004).
> PS: In my scenario after 25 million records the indexing significantly
> slows down (2x-4x) I do not know why? Could it be because of the
> B-Tree become very complex?
That seems unlikely, the B-Tree complexity grows logarithmically.
It's probably a cache effect - as the working set of a process grows,
performance can suddenly get worse when it just fails to fit in the
available CPU cache. In your case, I suspect it's some key subset of
the working set which is the issue.
When indexing, do you only call WritableDatabase::add_document()? If
so, we should be able to index significantly faster than this by
buffering appended changes in a more compact way.
Cheers,
Olly
More information about the Xapian-discuss
mailing list