[Xapian-discuss] time to build index
Jeroen van Dijk
jeroentjevandijk at gmail.com
Thu Oct 16 16:24:14 BST 2008
Thanks for your reply Olly. The wrong setting of 'XAPIAN_FLUSH_THRESHOLD'
you proposed was indeed one of the reasons it took so long. One of the other
reasons was a bad network connection and the wrong mysql gem (i'm working
with ruby).
The indexing process took 3 hours and create an index database of around
350mb.
Now I'll see if I can get it running with my rails app :)
Jeroen
On Wed, Oct 15, 2008 at 3:58 PM, Olly Betts <olly at survex.com> wrote:
> On Wed, Oct 15, 2008 at 02:16:15PM +0200, Jeroen van Dijk wrote:
> > The indexing process got to 1.2 million records and then it lost the
> > connection (my own fault i guess) after 16 hours and had built up an
> > indexing database of around 300mb.
> >
> > Should I be suspicious or should I just wait a little longer?
>
> That seems rather slow. It depends on the data and the hardware, but
> I'd expect more like a million documents per hour.
>
> If you aren't already, try setting XAPIAN_FLUSH_THRESHOLD in the
> environment to a value higher than the default of 10000. The best value
> depends on the nature of the data and how much memory you have, but
> 1000000 is worth a try.
>
> I've just realised that we don't actually seem to document
> XAPIAN_FLUSH_THRESHOLD anywhere, which probably explains why I have to
> keep highlighting it on the mailing list! I'll write up something...
>
> Cheers,
> Olly
>
More information about the Xapian-discuss
mailing list