[Xapian-discuss] Re: BUG IN XAPIAN_FLUSH_THRESHOLD

Kevin Duraj kevin.softdev at gmail.com
Tue Jul 17 20:51:27 BST 2007


Okay XAPIANS I found the Bug!

flint_database.cc for what ever reason is not picking up the
environment variable XAPIAN_FLUSH_THRESHOLD and makes the indexing
VERY SLOW, because it defaults it to 10000 documents. I was going
crazy for passed month after we switched to FLINT not able to figure
out why indexing goes so slow. Therefore I hard coded my own
flush_threshold directly to flint_database.cc and now indexing going
fast as before!

PS: Sometimes you just got to hack it yourself ... welcome to open
source ... *hahaha*


-= MY HACK =-
vi flint_database.cc

size_t FlintWritableDatabase::flush_threshold = 20000000;

FlintWritableDatabase::FlintWritableDatabase(const string &dir, int action,
       int block_size)
: freq_deltas(),
  doclens(),
  mod_plists(),
  database_ro(dir, action, block_size),
  total_length(database_ro.postlist_table.get_total_length()),
  lastdocid(database_ro.get_lastdocid()),
  changes_made(0)
{
    DEBUGCALL(DB, void, "FlintWritableDatabase", dir << ", " << action << ", "
      << block_size);
    //if (flush_threshold == 0)
    //{
  //   const char *p = getenv("XAPIAN_FLUSH_THRESHOLD");
  //   if (p) flush_threshold = atoi(p);
    //}
    //if (flush_threshold == 0) flush_threshold = 10000;
    flush_threshold = 20000000;
}




On 7/17/07, Kevin Duraj <kevin.softdev at gmail.com> wrote:
> There is is bug when setting XAPIAN_FLUSH_THRESHOLD=20000000
>
> When trying for force Xapian flush documents to flush after 20 million
> documents Xapian ignores the size and flush it after only 10,000
> documents.
>
> Data captured from delve after 60 seconds interval when has been set as follow:
> XAPIAN_FLUSH_THRESHOLD=20000000
>
> perl -e ' while(1) { system("delve ."); sleep(60); } '
>
> number of documents = 8510000
> average document length = 13.5538
> number of documents = 8520000
> average document length = 13.5537
> number of documents = 8530000
> average document length = 13.5543
> number of documents = 8530000
> average document length = 13.5543
> number of documents = 8540000
> average document length = 13.5548
> number of documents = 8550000
> average document length = 13.5548
> number of documents = 8550000
> average document length = 13.5548
> number of documents = 8560000
> average document length = 13.5545
> number of documents = 8570000
> average document length = 13.5549
> number of documents = 8570000
> average document length = 13.5549
> number of documents = 8580000
> average document length = 13.5563
> number of documents = 8590000
> average document length = 13.5568
>
> PS: Please do not ask me create smaller index and then merge them. I
> am indexing 500 million documents. 20 million is my small index.
>
> --
> Cheers,
>   Kevin Duraj
>


-- 
Cheers,
   Kevin



More information about the Xapian-discuss mailing list