Fuller compaction (was Re: [Xapian-discuss] Xapian and quartz scalability - feedback of current users)

Olly Betts olly at survex.com
Tue Mar 22 21:21:47 GMT 2005


On Tue, Mar 22, 2005 at 12:54:38PM +0000, Olly Betts wrote:
> Incidentally, I'll be adding a "fuller" compaction option soon which
> will allow item chunks to be larger after compaction.  Currently they're
> limited to allow at least 4 in a block, which is generally good for a
> database you want to update, but does add some overhead.  It should help
> record (if document data is ever more than about 2000 bytes), termlist,
> and position tables.  It's hard to estimate how much until I try it
> though...

I just had a quick fiddle:

    postlist: Reduced by 0.224934% 640K (284528K -> 283888K)
    record: Reduced by 0.392561% 488K (124312K -> 123824K)
    termlist: Reduced by 0.40976% 1064K (259664K -> 258600K)
    position: Size unchanged (0K)
    value: Size unchanged (0K)

I'm suprised the postlist table benefits - we're meant to pick a chunk
size such that the Btree doesn't need to split tags itself.

This makes me wonder if I've got the code wrong, but the compacted
database passes quartzcheck, which now checks more of the structure.
So perhaps the chunk size we use is too high.  But enough speculation -
I'll check through when I'm less tired.

Cheers,
    Olly



More information about the Xapian-discuss mailing list