[Xapian-discuss] weak populated b-trees?

Markus Wörle mrks at mrks.de
Fri Sep 5 19:14:21 BST 2008


Hi,

I just ran xapian-compact on an index which comsumes about 12 GB of  
disk space, containing 858.383 documents with an average doclength of  
169.018, and got surprised by a huge compactification factor which I  
haven't expected. After compactification, the index needed only 3.8 GB  
on disk anymore.

My expection was that it would only shrink about 25% or so, because of  
the average allocation of b-tree blocks with I expected to be about 75%.

This is what xapian-compact said:

postlist: Reduced by 76.888% 2444640K (3179480K -> 734840K)
record: Reduced by 65.4923% 1446352K (2208432K -> 762080K)
termlist: Reduced by 67.2607% 1110312K (1650760K -> 540448K)
position: Reduced by 56.6145% 2342160K (4137032K -> 1794872K)
value: Reduced by 81.2667% 397264K (488840K -> 91576K)
spelling: Size unchanged (0K)
synonym: Size unchanged (0K)

My Index' brief history:

The index was once built from scrach with add_document(), and got  
updated by a large amount of replace_document_by_term(),  
add_document(), and delete_document_by_term() over a longer period  
(about 2 month or so). Some numbers: about 1 million modifications per  
day, and thereof about 4000 document adds, and 3000 removes.  
Additionally, in this 2-month-period, all documents got rebuild about  
5 times by using replace_document_by_term() on a unique term for each  
document.

So my question is: Is this reasonable? Respectively, do you have any  
idea why my b-trees are such empty? Does Xapian merge weakly populated  
blocks again?

I am currently planning to stop indexing once a day to run xapian- 
compact, but I am uncertain if this whould "denaturate" the system. I  
have many modifications, and althought "best indexing performance" is  
not really a point in my use-case, I feel somehow bad about  
manipulating a natural-balanced b-tree in a non-changing environment.  
What do you suggest?

Thanks,
mrks


More information about the Xapian-discuss mailing list