[Xapian-discuss] Sanity check on database size

Jean-Francois Dockes jean-francois.dockes at wanadoo.fr
Wed Apr 5 09:18:52 BST 2006


Hello,

I am indexing an email message store with a custom tool (recoll).

The mail store has around 235 MB of data, with 245 000 messages.  It is
indexed into a quartz database with xapian 0.9.2.

When I dump the termlist database, I find around 1.1 million terms (no
stemming and a lot of garbage indexed), the size of the dump is
approximately 25 MB. 'delve' says that the average doc size is 540.

The xapian database is around 2.4 GB before compaction, 1.6 after.

The termlist_DB file is around 480 MB before compaction, 280 after.

I tried to copy the db to a flint one (with XAPIAN_PREFER_FLINT=yes
copydatabase ...), this doesn't seem to make a significant difference.

I know that xapian handles vastly bigger document sets, so I'm a bit
surprised by how the big the database is. In my usual experience on mixed
sets of documents, I had found the xapian database to be approximately the
same size as the document set.

Is the bigger database size due to the small average document size, does
this look normal, or am I probably doing something weird/wrong ?

Also both my indexing process and the copydatabase one seem to be using
around 150 MBs of memory (memory usage slowly increases up to this
value). Is this amount a (possibly tunable) constant or, if it's variable,
by what is it determined ?


Regards,
Jean-Francois Dockes



More information about the Xapian-discuss mailing list