Xapian 1.3.5 snapshot performance and index size

Jean-Francois Dockes jf at dockes.org
Sat Apr 30 14:04:36 BST 2016


Olly Betts writes:
 > On Tue, Apr 12, 2016 at 11:28:52AM +0200, Jean-Francois Dockes wrote:
 > > Olly Betts writes:
 > >  > Ideally we'd find a way to make it come out more compact to start with.
 > >  > 
 > >  > One thing which could help is making glass more willing to switch to
 > >  > "sequential mode".  If you fancy some more benchmarking, you could
 > >  > try changing SEQ_START_POINT in backends/glass/glass_table.cc.
 > >  > 
 > >  > It defaults to -10, but I don't think anyone has tried tuning it
 > >  > recently (this value comes from Martin's original code in commit
 > >  > 26bd647ff6084c60d8869f27d6abbd99e06c3f30 back in 2000 - he may have done
 > >  > tests to select it, but even if he did, so much has changed since).
 > >  > Something like -3 or -4 might work well - probably enough that it
 > >  > shouldn't enable when it's not useful, and by default we ensure at least
 > >  > 4 items fit in a block.
 > > 
 > > Ok, I tried this, with not much luck.
 > 
 > Many thanks for taking a look at this.
 > 
 > If you have the databases from your test around still, what's the
 > size of the tables in one of them after compaction?  It shouldn't
 > make a difference which version of the output database you compact to
 > find this.

Hi,

Here follow the table sizes before and after compaction, for xapian 1.3.5
and 1.2.21.

I re-ran the script which indexes after changing SEQ_START_POINT, probably
on a slightly different but equivalent data set (bunch of pdfs), and the
bad news is that I could not reproduce the earlier results, which showed a
small but consistent variation of index sizes with SEQ_START_POINT. During
the re-runs, the size variations are rather smaller (but of the same
order), and don't seem to follow an obvious pattern.

I don't know how to explain the change in behaviour, except for having
had a bit of luck the first time, which seems strange. However, given that
the variations were not that significant to begin with (around 1% of the
full index size), I've stopped trying. 

Regards,

jf




hm1$ xapian-compact-1.3 .recoll/xapiandb/ .recoll/xapiandb-compacted
postlist: Reduced by 63% 70584K (112016K -> 41432K)
docdata: Reduced by 1% 24K (1888K -> 1864K)
termlist: Reduced by 24% 9016K (36760K -> 27744K)
position: Reduced by 58% 278088K (475960K -> 197872K)
spelling: doesn't exist
synonym: Reduced by 42% 3840K (8936K -> 5096K)
hm1$ ls -l .recoll/xapiandb*
.recoll/xapiandb:
total 635576
-rw-r--r-- 1 dockes dockes   1933312 Apr 30 14:01 docdata.glass
-rw-r--r-- 1 dockes dockes         0 Apr 30 14:01 flintlock
-rw-r--r-- 1 dockes dockes       145 Apr 30 14:01 iamglass
-rw-r--r-- 1 dockes dockes 487383040 Apr 30 14:01 position.glass
-rw-r--r-- 1 dockes dockes 114704384 Apr 30 14:01 postlist.glass
-rw-r--r-- 1 dockes dockes   9150464 Apr 30 14:01 synonym.glass
-rw-r--r-- 1 dockes dockes  37642240 Apr 30 14:01 termlist.glass

.recoll/xapiandb-compacted:
total 274016
-rw-r--r-- 1 dockes dockes   1908736 Apr 30 14:10 docdata.glass
-rw-r--r-- 1 dockes dockes         0 Apr 30 14:10 flintlock
-rw-r--r-- 1 dockes dockes       134 Apr 30 14:11 iamglass
-rw-r--r-- 1 dockes dockes 202620928 Apr 30 14:11 position.glass
-rw-r--r-- 1 dockes dockes  42426368 Apr 30 14:10 postlist.glass
-rw-r--r-- 1 dockes dockes   5218304 Apr 30 14:11 synonym.glass
-rw-r--r-- 1 dockes dockes  28409856 Apr 30 14:10 termlist.glass


Same for xapian 1.2.21:

hm1$ xapian-compact .recoll/xapiandb/ .recoll/xapiandb-compacted
postlist: Reduced by 63% 78528K (123912K -> 45384K)
record: Reduced by 2% 48K (1904K -> 1856K)
termlist: Reduced by 25% 9432K (37096K -> 27664K)
position: Reduced by 0% 656K (220904K -> 220248K)
spelling: doesn't exist
synonym: Reduced by 46% 4848K (10464K -> 5616K)
hm1$ ls -l .recoll/xapiandb*
.recoll/xapiandb:
total 394336
-rw-r--r-- 1 dockes dockes         0 Apr 30 14:18 flintlock
-rw-r--r-- 1 dockes dockes        28 Apr 30 14:12 iamchert
-rw-r--r-- 1 dockes dockes      3473 Apr 30 14:18 position.baseA
-rw-r--r-- 1 dockes dockes      3473 Apr 30 14:18 position.baseB
-rw-r--r-- 1 dockes dockes 226205696 Apr 30 14:18 position.DB
-rw-r--r-- 1 dockes dockes      1954 Apr 30 14:18 postlist.baseA
-rw-r--r-- 1 dockes dockes      1954 Apr 30 14:18 postlist.baseB
-rw-r--r-- 1 dockes dockes 126885888 Apr 30 14:18 postlist.DB
-rw-r--r-- 1 dockes dockes        46 Apr 30 14:18 record.baseA
-rw-r--r-- 1 dockes dockes        46 Apr 30 14:18 record.baseB
-rw-r--r-- 1 dockes dockes   1949696 Apr 30 14:18 record.DB
-rw-r--r-- 1 dockes dockes       182 Apr 30 14:18 synonym.baseA
-rw-r--r-- 1 dockes dockes       182 Apr 30 14:18 synonym.baseB
-rw-r--r-- 1 dockes dockes  10715136 Apr 30 14:18 synonym.DB
-rw-r--r-- 1 dockes dockes       597 Apr 30 14:18 termlist.baseA
-rw-r--r-- 1 dockes dockes       597 Apr 30 14:18 termlist.baseB
-rw-r--r-- 1 dockes dockes  37986304 Apr 30 14:18 termlist.DB

.recoll/xapiandb-compacted:
total 300816
-rw-r--r-- 1 dockes dockes        28 Apr 30 14:19 iamchert
-rw-r--r-- 1 dockes dockes        13 Apr 30 14:19 position.baseA
-rw-r--r-- 1 dockes dockes      3462 Apr 30 14:19 position.baseB
-rw-r--r-- 1 dockes dockes 225533952 Apr 30 14:19 position.DB
-rw-r--r-- 1 dockes dockes        13 Apr 30 14:19 postlist.baseA
-rw-r--r-- 1 dockes dockes       728 Apr 30 14:19 postlist.baseB
-rw-r--r-- 1 dockes dockes  46473216 Apr 30 14:19 postlist.DB
-rw-r--r-- 1 dockes dockes        13 Apr 30 14:19 record.baseA
-rw-r--r-- 1 dockes dockes        44 Apr 30 14:19 record.baseB
-rw-r--r-- 1 dockes dockes   1900544 Apr 30 14:19 record.DB
-rw-r--r-- 1 dockes dockes        13 Apr 30 14:19 synonym.baseA
-rw-r--r-- 1 dockes dockes       105 Apr 30 14:19 synonym.baseB
-rw-r--r-- 1 dockes dockes   5750784 Apr 30 14:19 synonym.DB
-rw-r--r-- 1 dockes dockes        13 Apr 30 14:19 termlist.baseA
-rw-r--r-- 1 dockes dockes       450 Apr 30 14:19 termlist.baseB
-rw-r--r-- 1 dockes dockes  28327936 Apr 30 14:19 termlist.DB



More information about the Xapian-discuss mailing list