[Xapian-discuss] Indexing and commiting

Olly Betts olly at survex.com
Mon Apr 16 16:29:56 BST 2007


On Mon, Apr 16, 2007 at 04:19:43PM +0200, Andreas Marienborg wrote:
> Right now I fetch items from the database, add them to the index,  
> with terms etc,  and commit every 1000 documents.
> 
> Does this sound reasonable, or should I wait untill the end and  
> commit then, or something else all together?

Unless memory is tight, you are likely to get better throughput by
flushing larger batches less often.  Batches of a million or even
more can make sense for some applications.

> What I think I am seeing is that it takes longer and longer for each  
> commit, but that might be something else all together?

It will generally take longer to flush a batch to a larger database.

If you're trying to build a really large database from scratch or add a
large batch to an existing database, and you don't care about having
partial results available for searching, it is likely to be fastest
to indexing the new documents to one or more new databases, and then
use xapian-compact to merge the existing database and all the new
ones into a replacement for the existing database.

Cheers,
    Olly



More information about the Xapian-discuss mailing list