[Xapian-discuss] Suitability of Xapian for my application?

Olly Betts olly at survex.com
Mon Oct 18 17:26:02 BST 2004


On Mon, Oct 18, 2004 at 08:38:21AM -0700, Eric Parusel wrote:
> You wouldn't happen to have done any benchmarks on flushing every 
> document, would you?

I haven't - I've only looked at the effect of batching from 1000
upwards.

There's some overhead from the commiting mechanism - it requires
writing a file with the updated bitmap for each table.

For adding new documents, your updates will typically be spread across
the Btree which is harsh on disk block caching, plus you end up writing
more blocks since with batching updates to the same posting list will
typically update the same block, and similar sharing of updates to
parent blocks.  If you've lots of RAM, this may not be a problem.  For
the size of database you're looking at, it's quite feasible to have
enough RAM to potentially have the whole database cached.

If you're replacing documents, you also end up having to resplice
posting lists more.

One further issue with very frequent updates is that if 2 or more
updates happen during a read operation, then the operation will fail
with DatabaseModifiedError and have to be retried.  If you are really
hammering updates in, it may never be possible to complete a search!
There are plans to change the versioning scheme to eliminate
DatabaseModifiedError entirely, but as things stand it's an issue.

You could just try it and see.  If the indexer struggles to keep up
at peak times, you can then look at batching updates.

> I *would* be able to re-add documents potentially, I'm just not sure how 
> I'd know when the last flush/checkpoint was?

By setting the autoflush threshold very high and explicitly calling
WritableDatabse::flush() yourself (probably using thresholds on number
of documents and time).

Cheers,
    Olly



More information about the Xapian-discuss mailing list