[Xapian-discuss] time to build index

Olly Betts olly at survex.com
Wed Oct 15 14:58:26 BST 2008


On Wed, Oct 15, 2008 at 02:16:15PM +0200, Jeroen van Dijk wrote:
> The indexing process got to 1.2 million records and then it lost the
> connection (my own fault i guess) after 16 hours and had built up an
> indexing database of around 300mb.
> 
> Should I be suspicious or should I just wait a little longer?

That seems rather slow.  It depends on the data and the hardware, but
I'd expect more like a million documents per hour.

If you aren't already, try setting XAPIAN_FLUSH_THRESHOLD in the
environment to a value higher than the default of 10000.  The best value
depends on the nature of the data and how much memory you have, but
1000000 is worth a try.

I've just realised that we don't actually seem to document
XAPIAN_FLUSH_THRESHOLD anywhere, which probably explains why I have to
keep highlighting it on the mailing list!  I'll write up something...

Cheers,
    Olly



More information about the Xapian-discuss mailing list