[Xapian-discuss] Compressed Btrees
Arjen van der Meijden
arjen at glas.its.tudelft.nl
Sat Dec 11 09:53:39 GMT 2004
On 9-12-2004 11:03, Olly Betts wrote:
> On Thu, Dec 09, 2004 at 10:43:39AM +0100, Arjen van der Meijden wrote:
>
>>I'll test it with our database, using your hybrid settings, perhaps
>>position_DB is another good candidate to run in filtered-mode?
>
>
> Very likely. If it's not too long a process for your databases, you can just
> compress each table each of the 3 ways and mix and match the results by copying
> (say) record_* from one compacted directory to another.
>
> Incidentally, quartzcompact now reports some statistics for the size reduction
> achieved for each table.
But it only reports those statistics on record_ and value_.
I'm done testing, here are the results. It took about 9-10 hours to
compact and compress the database on an ide-disk-powered machine, I'll
see how long it takes on the scsi-powered one tomorrow, with 0.8.3 it
was a bit over 2:15u.
Not compressed/compacted:
total 14G
-rw-r--r-- 1 acm users 7343988736 Dec 9 18:55 position_DB
-rw-r--r-- 1 acm users 3636887552 Dec 9 19:03 postlist_DB
-rw-r--r-- 1 acm users 335282176 Dec 9 19:04 record_DB
-rw-r--r-- 1 acm users 3188170752 Dec 9 19:18 termlist_DB
-rw-r--r-- 1 acm users 73367552 Dec 9 19:19 value_DB
Normally compacted (this was with 0.8.3, I didn't take the byte-size):
total 9.6G
-rw-r--r-- 1 root root 6.3G Dec 9 08:09 position_DB
-rw-r--r-- 1 root root 1.5G Dec 9 06:19 postlist_DB
-rw-r--r-- 1 root root 228M Dec 9 06:00 record_DB
-rw-r--r-- 1 root root 1.6G Dec 9 06:49 termlist_DB
-rw-r--r-- 1 root root 56M Dec 9 08:09 value_DB
The compressed postions are about 6.3G, the postlists about 1.2/1.3G,
the termlists about 1.1/1.2G, record about 160M and value about 49M.
Compacted and zlib in default mode:
total 8.8G
-rw-r--r-- 1 root root 6729023488 Dec 10 04:58 position_DB
-rw-r--r-- 1 root root 1298120704 Dec 9 20:53 postlist_DB
-rw-r--r-- 1 root root 169009152 Dec 9 19:38 record_DB
-rw-r--r-- 1 root root 1148092416 Dec 9 22:16 termlist_DB
-rw-r--r-- 1 root root 50266112 Dec 10 05:03 value_DB
Compacted and zlib in filtered mode:
total 8.8G
-rw-r--r-- 1 root root 6730301440 Dec 10 14:26 position_DB
-rw-r--r-- 1 root root 1274216448 Dec 10 06:28 postlist_DB
-rw-r--r-- 1 root root 167747584 Dec 10 05:13 record_DB
-rw-r--r-- 1 root root 1177985024 Dec 10 07:50 termlist_DB
-rw-r--r-- 1 root root 50610176 Dec 10 14:32 value_DB
Compacted and zlib in huffman mode:
total 8.9G
-rw-r--r-- 1 root root 6736855040 Dec 11 00:21 position_DB
-rw-r--r-- 1 root root 1274421248 Dec 10 16:13 postlist_DB
-rw-r--r-- 1 root root 171991040 Dec 10 14:42 record_DB
-rw-r--r-- 1 root root 1219551232 Dec 10 17:36 termlist_DB
-rw-r--r-- 1 root root 52543488 Dec 11 00:26 value_DB
The differences in size are rather marginal. But the most compact
results would be achieved by:
Record: filtered
Postlist: filtered
Termlist: default
Position: default
Value: default
However it may be more efficient to just not compress the postion-db,
since there seems to be only a small gain for the extra cpu-power,
rounded all four are 6.3G in size.
I didn't test with dictionaries and stuff, since I don't fully
understand how I can fetch and create a good dictionary. (If you'd like
to experiment with that yourself, contact me off-list Olly)
Best regards,
Arjen van der Meijden
More information about the Xapian-discuss
mailing list