Xapian 1.3.5 snapshot performance and index size
Jean-Francois Dockes
jf at dockes.org
Sat Apr 30 14:04:36 BST 2016
Olly Betts writes:
> On Tue, Apr 12, 2016 at 11:28:52AM +0200, Jean-Francois Dockes wrote:
> > Olly Betts writes:
> > > Ideally we'd find a way to make it come out more compact to start with.
> > >
> > > One thing which could help is making glass more willing to switch to
> > > "sequential mode". If you fancy some more benchmarking, you could
> > > try changing SEQ_START_POINT in backends/glass/glass_table.cc.
> > >
> > > It defaults to -10, but I don't think anyone has tried tuning it
> > > recently (this value comes from Martin's original code in commit
> > > 26bd647ff6084c60d8869f27d6abbd99e06c3f30 back in 2000 - he may have done
> > > tests to select it, but even if he did, so much has changed since).
> > > Something like -3 or -4 might work well - probably enough that it
> > > shouldn't enable when it's not useful, and by default we ensure at least
> > > 4 items fit in a block.
> >
> > Ok, I tried this, with not much luck.
>
> Many thanks for taking a look at this.
>
> If you have the databases from your test around still, what's the
> size of the tables in one of them after compaction? It shouldn't
> make a difference which version of the output database you compact to
> find this.
Hi,
Here follow the table sizes before and after compaction, for xapian 1.3.5
and 1.2.21.
I re-ran the script which indexes after changing SEQ_START_POINT, probably
on a slightly different but equivalent data set (bunch of pdfs), and the
bad news is that I could not reproduce the earlier results, which showed a
small but consistent variation of index sizes with SEQ_START_POINT. During
the re-runs, the size variations are rather smaller (but of the same
order), and don't seem to follow an obvious pattern.
I don't know how to explain the change in behaviour, except for having
had a bit of luck the first time, which seems strange. However, given that
the variations were not that significant to begin with (around 1% of the
full index size), I've stopped trying.
Regards,
jf
hm1$ xapian-compact-1.3 .recoll/xapiandb/ .recoll/xapiandb-compacted
postlist: Reduced by 63% 70584K (112016K -> 41432K)
docdata: Reduced by 1% 24K (1888K -> 1864K)
termlist: Reduced by 24% 9016K (36760K -> 27744K)
position: Reduced by 58% 278088K (475960K -> 197872K)
spelling: doesn't exist
synonym: Reduced by 42% 3840K (8936K -> 5096K)
hm1$ ls -l .recoll/xapiandb*
.recoll/xapiandb:
total 635576
-rw-r--r-- 1 dockes dockes 1933312 Apr 30 14:01 docdata.glass
-rw-r--r-- 1 dockes dockes 0 Apr 30 14:01 flintlock
-rw-r--r-- 1 dockes dockes 145 Apr 30 14:01 iamglass
-rw-r--r-- 1 dockes dockes 487383040 Apr 30 14:01 position.glass
-rw-r--r-- 1 dockes dockes 114704384 Apr 30 14:01 postlist.glass
-rw-r--r-- 1 dockes dockes 9150464 Apr 30 14:01 synonym.glass
-rw-r--r-- 1 dockes dockes 37642240 Apr 30 14:01 termlist.glass
.recoll/xapiandb-compacted:
total 274016
-rw-r--r-- 1 dockes dockes 1908736 Apr 30 14:10 docdata.glass
-rw-r--r-- 1 dockes dockes 0 Apr 30 14:10 flintlock
-rw-r--r-- 1 dockes dockes 134 Apr 30 14:11 iamglass
-rw-r--r-- 1 dockes dockes 202620928 Apr 30 14:11 position.glass
-rw-r--r-- 1 dockes dockes 42426368 Apr 30 14:10 postlist.glass
-rw-r--r-- 1 dockes dockes 5218304 Apr 30 14:11 synonym.glass
-rw-r--r-- 1 dockes dockes 28409856 Apr 30 14:10 termlist.glass
Same for xapian 1.2.21:
hm1$ xapian-compact .recoll/xapiandb/ .recoll/xapiandb-compacted
postlist: Reduced by 63% 78528K (123912K -> 45384K)
record: Reduced by 2% 48K (1904K -> 1856K)
termlist: Reduced by 25% 9432K (37096K -> 27664K)
position: Reduced by 0% 656K (220904K -> 220248K)
spelling: doesn't exist
synonym: Reduced by 46% 4848K (10464K -> 5616K)
hm1$ ls -l .recoll/xapiandb*
.recoll/xapiandb:
total 394336
-rw-r--r-- 1 dockes dockes 0 Apr 30 14:18 flintlock
-rw-r--r-- 1 dockes dockes 28 Apr 30 14:12 iamchert
-rw-r--r-- 1 dockes dockes 3473 Apr 30 14:18 position.baseA
-rw-r--r-- 1 dockes dockes 3473 Apr 30 14:18 position.baseB
-rw-r--r-- 1 dockes dockes 226205696 Apr 30 14:18 position.DB
-rw-r--r-- 1 dockes dockes 1954 Apr 30 14:18 postlist.baseA
-rw-r--r-- 1 dockes dockes 1954 Apr 30 14:18 postlist.baseB
-rw-r--r-- 1 dockes dockes 126885888 Apr 30 14:18 postlist.DB
-rw-r--r-- 1 dockes dockes 46 Apr 30 14:18 record.baseA
-rw-r--r-- 1 dockes dockes 46 Apr 30 14:18 record.baseB
-rw-r--r-- 1 dockes dockes 1949696 Apr 30 14:18 record.DB
-rw-r--r-- 1 dockes dockes 182 Apr 30 14:18 synonym.baseA
-rw-r--r-- 1 dockes dockes 182 Apr 30 14:18 synonym.baseB
-rw-r--r-- 1 dockes dockes 10715136 Apr 30 14:18 synonym.DB
-rw-r--r-- 1 dockes dockes 597 Apr 30 14:18 termlist.baseA
-rw-r--r-- 1 dockes dockes 597 Apr 30 14:18 termlist.baseB
-rw-r--r-- 1 dockes dockes 37986304 Apr 30 14:18 termlist.DB
.recoll/xapiandb-compacted:
total 300816
-rw-r--r-- 1 dockes dockes 28 Apr 30 14:19 iamchert
-rw-r--r-- 1 dockes dockes 13 Apr 30 14:19 position.baseA
-rw-r--r-- 1 dockes dockes 3462 Apr 30 14:19 position.baseB
-rw-r--r-- 1 dockes dockes 225533952 Apr 30 14:19 position.DB
-rw-r--r-- 1 dockes dockes 13 Apr 30 14:19 postlist.baseA
-rw-r--r-- 1 dockes dockes 728 Apr 30 14:19 postlist.baseB
-rw-r--r-- 1 dockes dockes 46473216 Apr 30 14:19 postlist.DB
-rw-r--r-- 1 dockes dockes 13 Apr 30 14:19 record.baseA
-rw-r--r-- 1 dockes dockes 44 Apr 30 14:19 record.baseB
-rw-r--r-- 1 dockes dockes 1900544 Apr 30 14:19 record.DB
-rw-r--r-- 1 dockes dockes 13 Apr 30 14:19 synonym.baseA
-rw-r--r-- 1 dockes dockes 105 Apr 30 14:19 synonym.baseB
-rw-r--r-- 1 dockes dockes 5750784 Apr 30 14:19 synonym.DB
-rw-r--r-- 1 dockes dockes 13 Apr 30 14:19 termlist.baseA
-rw-r--r-- 1 dockes dockes 450 Apr 30 14:19 termlist.baseB
-rw-r--r-- 1 dockes dockes 28327936 Apr 30 14:19 termlist.DB
More information about the Xapian-discuss
mailing list