[Xapian-discuss] Weekly replacement of documents.

Olly Betts olly at survex.com
Tue Jun 26 17:05:03 BST 2007


On Tue, Jun 26, 2007 at 05:23:13AM +0000, David wrote:
> But, every week we get new data, and most documents will have to be
> Xapian::WritableDatabase::replace_document()'d. What type of effect would this
> have?

Note that if most documents have changed, it'll probably be
significantly faster to just rebuild the database if you have a copy of
the current data rather than just the delta.  The case of appending
lots of documents to a database is particularly well optimised, and
probably inherently faster anyway.

But if you're happy with the update speed, there's no problem with
replacing lots of documents.

> Since the majority of the database will, in effect, be "replaced" on a weekly
> basis, how does the database re-organize itself?

Blocks are kept between 50% and 100% full, except we don't currently
coalesce blocks when deleting.  In reality, that doesn't seem to matter
- the next big update will fill most of them up again, and totally empty
blocks are released for reuse.

> Would I have to do some sort of compacting?

You can run xapian-compact to eliminate any currently unused blocks and
fill blocks fuller (typically 95-100% full with the default options).
Until the next update the database is also especially fast to search.

If you plan to update further, 95-100% full blocks mean the next few
updates will cause a lot of block-splitting so "xapian-compact -n" might
be a better option, as this stops it trying to cram all the blocks so
full.  I've not profiled if this actually helps however.  It would be
interesting to know.

Cheers,
    Olly



More information about the Xapian-discuss mailing list