[Xapian-discuss] XAPIAN_FLUSH_THRESHOLD

Kevin Duraj kevin.softdev at gmail.com
Mon Jun 11 22:42:33 BST 2007


Richard,

I have been using to index XAPIAN_FLUSH_THRESHOLD for 10 million
documents over 6 months and it works fine and fast until the Xapian
version 1.0. It used to take 50 minutes to index 10 million documents.
By installing Xapian 1.0.0.  ... now 10 million documents takes approx
16 hours to index. I was looking for bugs in my code but saw that very
little memory has been used even when threshold was set to 10 million.

I have installed Xapian 1.0.1 it seems to be using more memory that is
good. What might be large for you is small for others. I want to be
able to index 1 billion of documents in reasonable time. Either Xapian
1.0 does not take in account the threshold or the compression that was
introduced takes too much time. We need to have option in
environmental variable to disable any compression.

- I do not care how large the index is, and that compression reduce the size.
- I care how much time it takes to index 10-100 million of documents
per one index.

Thank you,
Kevin

On 6/11/07, Richard Boulton <richard at lemurconsulting.com> wrote:
> Kevin Duraj wrote:
> > I am running Xapian 1.0.0. and have XAPIAN_FLUSH_THRESHOLD
> > environmental variable set to 10 million of documents.
> >
> > XAPIAN_FLUSH_THRESHOLD=10000000
> >
> > Xapian1.0.0 does not seems to respond to XAPIAN_FLUSH_THRESHOLD size
> > set and not using more memory which causing slow indexing.
>
> I've done some performance tests recently with Xapian 1.0.0, in which
> XAPIAN_FLUSH_THRESHOLD very definitely had an effect.  10 million is a
> very large value, though: unless your documents are very small, or you
> have a very very large amount of memory, you might be getting slow
> indexing due to running out of main memory and forcing your index
> process to use swap...
>


-- 
Cheers,
   Kevin



More information about the Xapian-discuss mailing list