[Xapian-discuss] How to update DB concurrently?

oscaruser at programmer.net oscaruser at programmer.net
Thu May 18 23:13:54 BST 2006


Folks,

I switched to flint, set XAPIAN_FLUSH_THRESHOLD, and I 
rolled the indexer into the spiders. Now it creates 150 separate 
indexes. I am using omega.cgi to perform search. How can I query 
all 150 dbs at the same time?

Thanks

> ----- Original Message -----
> From: "Olly Betts" <olly at survex.com>
> To: oscaruser at programmer.net
> Subject: Re: [Xapian-discuss] How to update DB concurrently?
> Date: Thu, 18 May 2006 09:41:22 +0100
> 
> 
> On Wed, May 17, 2006 at 08:52:58PM -0800, oscaruser at programmer.net wrote:
> > How can I increase or improve the rate of the indexer to the level the
> > spiders are processing the URLs?
> 
> Hmm, I'd imagine 150 spiders are probably netting you several hundred
> documents per second, maybe thousands.
> 
> Some ideas:
> 
> * Read http://www.xapian.org/docs/scalability.html if you haven't
>    already.
> 
> * Make sure the indexer is running continuously and don't call flush()
>    explicitly.
> 
> * Batch up updates by setting XAPIAN_FLUSH_THRESHOLD in the
>    environment (don't forget to export it!)  It defaults to 10000 - if
>    you've plenty of RAM, you can raise this substantially.  Gmane uses
>    100000 (100 thousand) currently.
> 
> * Use the flint backend instead of quartz:
>    http://wiki.xapian.org/FlintBackend
>    Don't be put off by the warning - the current state very stable
>    (sufficiently good that I'm contemplating forking off a copy as
>    the default backend for Xapian 1.0.)
> 
> * Make sure the machine has plenty of RAM and fast disks.
> 
> * Run several indexers into separate databases and merge these later
>    with xapian-compact (for flint) or quartzcompact (for quartz).  The
>    indexing rate drops off gradually as database size grows, so the
>    fastest way to build a large database is to build a number of
>    databases and merge - gmane builds databases containing 1 million
>    documents each and then merges them together.  I chose this threshold
>    after doing a bit of profiling so it's a good starting value, but you
>    may be able to tune it further and it'll depend on your hardware too.
> 
> * If you aren't trying to read from the databases while building
>    them, you could try enabling "dangerous mode" - for flint you
>    just need to uncomment the obvious #define in
>    backends/flint/flint_table.cc (search for DANGEROUS) and recompile.
>    "Dangerous" mode updates blocks in place rather than ensuring the
>    old version is preserved, so reading while writing won't work, and
>    (this is the "danger" bit) if the power fails or the system crashes
>    your database may not be in a consistent state.  But it reduces the
>    amount of I/O and buys you a little speed.  I use this mode to build
>    gmane's database.
> 
> I'm also have plans for a number of improvements, which I'm working on
> in an on-going fashion.  If you're in a hurry and have a budget for
> your project, then funding is always welcome and would enable me to
> devote more time to this work!
> 
> Cheers,
>      Olly

> 


-- 
___________________________________________________
Play 100s of games for FREE! http://games.mail.com/




More information about the Xapian-discuss mailing list