[Xapian-discuss] Re: Re: BUG IN XAPIAN_FLUSH_THRESHOLD

Sungsoo Kim kiss at imageclick.com
Tue Aug 28 03:17:22 BST 2007


Dear Olly,

I have the same experience with xapian 0.9.4 that Kevin described before. I am sure that XAPIAN_FLUSH_THRESHOLD is not working in 0.9.4. I can see my indexer stops for a while every 10,000 records to flush the buffer after I set XAPIAN_FLUSH_THRESHOLD environment variable to 100,000.

In my case it is not critical because the size of my database is only about 3 million.


Regards,


Sungsoo Kim



----- Original Message ----- 
From: "Kevin Duraj" <kevin.softdev at gmail.com>
To: "Xapian Discussion" <xapian-discuss at lists.xapian.org>; "Kevin Duraj" <kevin.softdev at gmail.com>
Sent: Tuesday, August 28, 2007 6:39 AM
Subject: Re: [Xapian-discuss] Re: BUG IN XAPIAN_FLUSH_THRESHOLD


> Olly,
> 
> Basically I give up on using XAPIAN_FLUSH_THRESHOLD environment
> variable, and always change
> xapian-core-1.0.2/backends/flint/flint_database.cc
> values to 20 million and then recompile Xapian.
> 
> I have been exporting XAPIAN_FLUSH_THRESHOLD  properly and my indexing
> is running from crontab and I know crontab may not see the same
> environment variables.
> 
> I think developers who are indexing large amount of data (1 million
> and up) having the same issue by slow indexing but the
> XAPIAN_FLUSH_THRESHOLD makes dramatic positive change in performance.
> I know there are many folks still using 2GB memory and 32 bit machines
> however we must focus our development on utilizing 64 bit machines
> with multi-processor and lot of memory.
> 
> Example system with 32 processor with 256 GB memory
> http://hpcsystems.com/
> 
> We can start to compete with major search engines such a Google etc
> ... can you see the FUTURE? We need to come up utilizing all the CPUs
> and all memory available on server, when indexing, compacting,
> searching, omega tools etc ...
> 
> PS: flying to Asia won't answer emails for while ...
> -- 
> Cheers,
>   Kevin Duraj
>   http://pacificair.com
> 
> On 8/22/07, Olly Betts <olly at survex.com> wrote:
>> On Tue, Jul 17, 2007 at 12:51:27PM -0700, Kevin Duraj wrote:
>> > Okay XAPIANS I found the Bug!
>> >
>> > flint_database.cc for what ever reason is not picking up the
>> > environment variable XAPIAN_FLUSH_THRESHOLD and makes the indexing
>> > VERY SLOW, because it defaults it to 10000 documents. I was going
>> > crazy for passed month after we switched to FLINT not able to figure
>> > out why indexing goes so slow. Therefore I hard coded my own
>> > flush_threshold directly to flint_database.cc and now indexing going
>> > fast as before!
>>
>> Hmm, the code there looks fine to me.  Also, it hasn't changed for about
>> 2 years, so this doesn't look like a bug in Xapian to me.
>>
>> Kevin: Are you sure you're exporting XAPIAN_FLUSH_THRESHOLD after you
>> set it?  Also, check you spelled it correctly!
>>
>> Also, does your patch fix all your complaints about flint performance
>> in 1.0.x?
>>
>> Cheers,
>>    Olly
>>
> 
> 
> -- 
> Cheers,
>   Kevin Duraj
>   http://pacificair.com
> 
> _______________________________________________
> Xapian-discuss mailing list
> Xapian-discuss at lists.xapian.org
> http://lists.xapian.org/mailman/listinfo/xapian-discuss
>


More information about the Xapian-discuss mailing list