[Xapian-discuss] Expected IndexSize/DataSize Ratio

Olly Betts olly at survex.com
Sat Sep 13 06:03:22 BST 2008


On Fri, Sep 05, 2008 at 11:53:03AM +1000, cel tix44 wrote:
> When running a simple indexing test, I noticed that Xapian generates a
>  ~74 MB index database for ~24 MB of data.
> 
> Is that the expected Index-To-Data size ratio?

It can vary quite a bit between data sets, but when indexing with
positions, they're usually around the same size with the flint backend.

> Is there a way to make the index smaller?

You might find that compacting the database with xapian-compact makes
a significant difference.

culling stopwords at index time can save quite a bit of space.  Also,
filtering out "junk" terms can too in some applications - for example,
when indexing email, ASCII art in signatures doesn't produce useful
terms for searching on.

The development backend (chert) does quite a bit better too (the
postlist table is ~44% smaller for gmane).

Cheers,
    Olly



More information about the Xapian-discuss mailing list