[Xapian-discuss] xapian indexing size?
rm at fabula.de
rm at fabula.de
Thu May 5 19:09:41 BST 2005
On Thu, May 05, 2005 at 01:54:18PM -0400, John Paige wrote:
> Yes, I was expecting that to be smaller than the corpus size.
> There are application tools like Glimpse,
Can't comment on this - i gave up on glimpse because of licence issues :-/
> swish-e for example creates
> an index that is much smaller than the corpse size (between 10 - 25%
> of the corpus size).
Hmm, but isn't it swish-e that only indexes selected words from the corpus.
That's of course a neat way to get smaller indices.
cheers RalfD
> Thanks,
> John
>
> On 5/5/05, rm at fabula.de <rm at fabula.de> wrote:
> > On Thu, May 05, 2005 at 01:39:20PM -0400, John Paige wrote:
> > > Hi,
> > > I am evaluating to use xapian in our product. I just downloaded the
> > > core and examples code from the website.
> > > I'm puzzeled about one thing though, when I used the test program
> > > "simpleIndexer", I found out that the index size is four times the
> > > size of the corpus. I indexed 4MB worth of text files, and the index
> > > was 16MB to index, and even after compaction, it still consumed 10MB.
> > > when I added additional 4MB of text files, the original index went to 32MB.
> > >
> > > The index size is four times the size of the corpus, it doesn't seem
> > > right. Am I doing something wrong?
> >
> > Most likely not - but tell us what you _expect_ the index size to be?
> > Do you expect the index size to be _smaller_ than the corpus?
> >
> > Cheers Ralf Mattes
> >
> > > Thanks,
> > > John
> > >
> > > _______________________________________________
> > > Xapian-discuss mailing list
> > > Xapian-discuss at lists.xapian.org
> > > http://lists.xapian.org/mailman/listinfo/xapian-discuss
> >
More information about the Xapian-discuss
mailing list