[Xapian-discuss] Re: BUG IN XAPIAN_FLUSH_THRESHOLD

Mark Clarkson mark.clarkson at smorg.co.uk
Wed Jul 18 09:35:48 BST 2007


Probably unnecessary to suggest but, if using bash, export is required:

$ export XAPIAN_FLUSH_THRESHOLD=2000000
$ your_program

On Tue, 2007-07-17 at 12:51 -0700, Kevin Duraj wrote:
> Okay XAPIANS I found the Bug!
> 
> flint_database.cc for what ever reason is not picking up the
> environment variable XAPIAN_FLUSH_THRESHOLD and makes the indexing
> VERY SLOW, because it defaults it to 10000 documents. I was going
> crazy for passed month after we switched to FLINT not able to figure
> out why indexing goes so slow. Therefore I hard coded my own
> flush_threshold directly to flint_database.cc and now indexing going
> fast as before!
> 
> PS: Sometimes you just got to hack it yourself ... welcome to open
> source ... *hahaha*
> 
> 
> -= MY HACK =-
> vi flint_database.cc
> 
> size_t FlintWritableDatabase::flush_threshold = 20000000;
> 
> FlintWritableDatabase::FlintWritableDatabase(const string &dir, int action,
>        int block_size)
> : freq_deltas(),
>   doclens(),
>   mod_plists(),
>   database_ro(dir, action, block_size),
>   total_length(database_ro.postlist_table.get_total_length()),
>   lastdocid(database_ro.get_lastdocid()),
>   changes_made(0)
> {
>     DEBUGCALL(DB, void, "FlintWritableDatabase", dir << ", " << action << ", "
>       << block_size);
>     //if (flush_threshold == 0)
>     //{
>   //   const char *p = getenv("XAPIAN_FLUSH_THRESHOLD");
>   //   if (p) flush_threshold = atoi(p);
>     //}
>     //if (flush_threshold == 0) flush_threshold = 10000;
>     flush_threshold = 20000000;
> }
> 
> 
> 
> 
> On 7/17/07, Kevin Duraj <kevin.softdev at gmail.com> wrote:
> > There is is bug when setting XAPIAN_FLUSH_THRESHOLD=20000000
> >
> > When trying for force Xapian flush documents to flush after 20 million
> > documents Xapian ignores the size and flush it after only 10,000
> > documents.
> >
> > Data captured from delve after 60 seconds interval when has been set as follow:
> > XAPIAN_FLUSH_THRESHOLD=20000000
> >
> > perl -e ' while(1) { system("delve ."); sleep(60); } '
> >
> > number of documents = 8510000
> > average document length = 13.5538
> > number of documents = 8520000
> > average document length = 13.5537
> > number of documents = 8530000
> > average document length = 13.5543
> > number of documents = 8530000
> > average document length = 13.5543
> > number of documents = 8540000
> > average document length = 13.5548
> > number of documents = 8550000
> > average document length = 13.5548
> > number of documents = 8550000
> > average document length = 13.5548
> > number of documents = 8560000
> > average document length = 13.5545
> > number of documents = 8570000
> > average document length = 13.5549
> > number of documents = 8570000
> > average document length = 13.5549
> > number of documents = 8580000
> > average document length = 13.5563
> > number of documents = 8590000
> > average document length = 13.5568
> >
> > PS: Please do not ask me create smaller index and then merge them. I
> > am indexing 500 million documents. 20 million is my small index.
> >
> > --
> > Cheers,
> >   Kevin Duraj
> >
> 
> 




More information about the Xapian-discuss mailing list