[Xapian-discuss] problem on closing writable databases

Olly Betts olly at survex.com
Mon Feb 23 10:20:25 GMT 2009


On Fri, Feb 20, 2009 at 04:33:05PM +0100, Markus Wörle wrote:
> I am not sure whether this is a bug or not. I constantly add/remove/ 
> replace (with replace_document_by_term)  a huge amount of documents. I  
> occasionally even replace all documents in an index. Whenever I do  
> that, the index grows slightly, becomes slow over time, wastes RAM  
> (cached diskblocks), etc. After compacting it all starts from the  
> beginning.

Well, deleting many documents will result in a lot of under-utilised and
unused blocks (as I said in the response you quote above), but this
space should get reused when more data is added.

A second issue (which I didn't mention there) is that when you make a
lot of changes between flushes, the changed blocks are rewritten to new
blocks and then all switched live at once (essentially like
copy-on-write).  So if you change 25% of the database in a single
transaction, then the database will grow by 25% (assuming there were no
unused blocks, and block utilisation was typical) and after committing,
20% (25 out of 125) of the blocks will be unused by the latest revision.
But if you change a similar amount again, it should reuse that 20%.

That could be a factor in what you're seeing.

If there is a bug here, it's probably not just us "leaking" blocks as
xapian-check compares the blocks in the tree to the free block list
IIRC.

Ideally we'd allow the file to shrink when blocks at the end are unused
(by the latest two revisions).  We could potentially relocate blocks
at the end to earlier unused blocks to assist this.  That might often
be quicker (though less effective) than xapian-compact, which copies
all the key/tag pairs into a new Btree table.

Cheers,
    Olly



More information about the Xapian-discuss mailing list