[Xapian-discuss] time to build index

Jeroen van Dijk jeroentjevandijk at gmail.com
Thu Oct 16 16:24:14 BST 2008


Thanks for your reply Olly. The wrong setting of 'XAPIAN_FLUSH_THRESHOLD'
you proposed was indeed one of the reasons it took so long. One of the other
reasons was a bad network connection and the wrong mysql gem (i'm working
with ruby).

The indexing process took 3 hours and create an index database of around
350mb.

Now I'll see if I can get it running with my rails app :)

Jeroen

On Wed, Oct 15, 2008 at 3:58 PM, Olly Betts <olly at survex.com> wrote:

> On Wed, Oct 15, 2008 at 02:16:15PM +0200, Jeroen van Dijk wrote:
> > The indexing process got to 1.2 million records and then it lost the
> > connection (my own fault i guess) after 16 hours and had built up an
> > indexing database of around 300mb.
> >
> > Should I be suspicious or should I just wait a little longer?
>
> That seems rather slow.  It depends on the data and the hardware, but
> I'd expect more like a million documents per hour.
>
> If you aren't already, try setting XAPIAN_FLUSH_THRESHOLD in the
> environment to a value higher than the default of 10000.  The best value
> depends on the nature of the data and how much memory you have, but
> 1000000 is worth a try.
>
> I've just realised that we don't actually seem to document
> XAPIAN_FLUSH_THRESHOLD anywhere, which probably explains why I have to
> keep highlighting it on the mailing list!  I'll write up something...
>
> Cheers,
>     Olly
>


More information about the Xapian-discuss mailing list