[Xapian-discuss] what about the efficiency of building indexes
Olly Betts
olly at survex.com
Thu Apr 29 06:01:49 BST 2010
On Mon, Apr 26, 2010 at 09:20:53AM +0100, Tom Mortimer wrote:
> One thing that has a big impact on indexing performance is how often
> you are flushing/committing changes. You should generally do this as
> infrequently as possible. Flushing after every document will make
> things run very slow.
If you are indexing everything in a single go and aren't flushing explicitly,
Xapian flushing automatically, by default every 10000 documents added,
modified, or removed. This is quite conservative - set XAPIAN_FLUSH_THRESHOLD
in the environment to override this - e.g.:
XAPIAN_FLUSH_THRESHOLD=1000000 omindex --db /path/to/db /srv/www
If you have suitable hardware, it's certainly possible to index 3 million
documents in well under an hour.
Cheers,
Olly
More information about the Xapian-discuss
mailing list