[Xapian-discuss] xapian indexing size?

rm at fabula.de rm at fabula.de
Thu May 5 19:09:41 BST 2005


On Thu, May 05, 2005 at 01:54:18PM -0400, John Paige wrote:
> Yes, I was expecting that to be smaller than the corpus size. 
> There are application tools like Glimpse,

Can't comment on this - i gave up on glimpse because of licence issues :-/

> swish-e for example creates
> an index that is much smaller than the corpse size (between 10 - 25%
> of the corpus size).

Hmm, but isn't it swish-e that only indexes selected words from the corpus.
That's of course a neat way to get smaller indices.

 cheers RalfD

> Thanks,
> John
> 
> On 5/5/05, rm at fabula.de <rm at fabula.de> wrote:
> > On Thu, May 05, 2005 at 01:39:20PM -0400, John Paige wrote:
> > > Hi,
> > >    I am evaluating to use xapian in our product. I just downloaded the
> > > core and examples code from the website.
> > > I'm puzzeled about one thing though,  when I used the test program
> > > "simpleIndexer", I found out that the index size is four times the
> > > size of the corpus. I indexed 4MB worth of text files, and the index
> > > was 16MB to index, and even after compaction, it still consumed 10MB.
> > > when I added additional 4MB of text files, the original index went to 32MB.
> > >
> > > The index size is four times the size of the corpus, it doesn't seem
> > > right. Am I doing something wrong?
> > 
> > Most likely not - but tell us what you _expect_ the index size to be?
> > Do you expect the index size to be _smaller_ than the corpus?
> > 
> >  Cheers Ralf Mattes
> > 
> > > Thanks,
> > > John
> > >
> > > _______________________________________________
> > > Xapian-discuss mailing list
> > > Xapian-discuss at lists.xapian.org
> > > http://lists.xapian.org/mailman/listinfo/xapian-discuss
> >



More information about the Xapian-discuss mailing list