[Xapian-discuss] what about the efficiency of building indexes

Olly Betts olly at survex.com
Thu Apr 29 06:01:49 BST 2010


On Mon, Apr 26, 2010 at 09:20:53AM +0100, Tom Mortimer wrote:
> One thing that has a big impact on indexing performance is how often
> you are flushing/committing changes. You should generally do this as
> infrequently as possible. Flushing after every document will make
> things run very slow.

If you are indexing everything in a single go and aren't flushing explicitly, 
Xapian flushing automatically, by default every 10000 documents added,
modified, or removed.  This is quite conservative - set XAPIAN_FLUSH_THRESHOLD
in the environment to override this - e.g.:

XAPIAN_FLUSH_THRESHOLD=1000000 omindex --db /path/to/db /srv/www

If you have suitable hardware, it's certainly possible to index 3 million
documents in well under an hour.

Cheers,
    Olly



More information about the Xapian-discuss mailing list