[Xapian-discuss] How to update DB concurrently?
oscaruser at programmer.net
oscaruser at programmer.net
Fri May 19 01:39:36 BST 2006
I added the following :
delta:/home/oscar/xapian/omega-0.9.6# diff omega.cc ../orig/omega-0.9.6/omega.cc
33,34d32
< #include <iomanip>
< #include <sstream>
136,143d133
<
< for (int index = 0; index < 150; index++) {
< std::ostringstream s;
< s << "/svr/hda1/omega/data/mydb" << std::setfill('0') << std::setw(4) << index
< << "/default";
< //cout << s.str() << endl;
< db.add_database(Xapian::Database(s.str()));
< }
delta:/home/oscar/xapian/omega-0.9.6#
Activating the URL "http://delta/cgi-bin/omega.cgi" in the browser properly shows the building index, but searching does not return any results. It seems that the URL uses the templates to perform the query, show a result page with URL "http://delta/cgi-bin/omega.cgi?P=test&DEFAULTOP=or&DB=default&FMT=query&xP=test.&xDB=default&xFILTERS=--O", but not show any results. How can I show the results for query all the DBs?
Thanks
> ----- Original Message -----
> From: oscaruser at programmer.net
> To: "Olly Betts" <olly at survex.com>
> Subject: Re: [Xapian-discuss] How to update DB concurrently?
> Date: Thu, 18 May 2006 15:34:49 -0800
>
>
> I just added the following lines to omega.cc (I'll do a better job
> coding it in a loop in C++)
>
> for (int i = 0; i < 150; i++)
> db.add_database(Xapian::Database(db));
>
> Thanks
>
> > ----- Original Message -----
> > From: oscaruser at programmer.net
> > To: "Olly Betts" <olly at survex.com>
> > Subject: Re: [Xapian-discuss] How to update DB concurrently?
> > Date: Thu, 18 May 2006 14:13:54 -0800
> >
> >
> > Folks,
> >
> > I switched to flint, set XAPIAN_FLUSH_THRESHOLD, and I
> > rolled the indexer into the spiders. Now it creates 150 separate
> > indexes. I am using omega.cgi to perform search. How can I query
> > all 150 dbs at the same time?
> >
> > Thanks
> >
> > > ----- Original Message -----
> > > From: "Olly Betts" <olly at survex.com>
> > > To: oscaruser at programmer.net
> > > Subject: Re: [Xapian-discuss] How to update DB concurrently?
> > > Date: Thu, 18 May 2006 09:41:22 +0100
> > >
> > >
> > > On Wed, May 17, 2006 at 08:52:58PM -0800, oscaruser at programmer.net wrote:
> > > > How can I increase or improve the rate of the indexer to the level the
> > > > spiders are processing the URLs?
> > >
> > > Hmm, I'd imagine 150 spiders are probably netting you several hundred
> > > documents per second, maybe thousands.
> > >
> > > Some ideas:
> > >
> > > * Read http://www.xapian.org/docs/scalability.html if you haven't
> > > already.
> > >
> > > * Make sure the indexer is running continuously and don't call flush()
> > > explicitly.
> > >
> > > * Batch up updates by setting XAPIAN_FLUSH_THRESHOLD in the
> > > environment (don't forget to export it!) It defaults to 10000 - if
> > > you've plenty of RAM, you can raise this substantially. Gmane uses
> > > 100000 (100 thousand) currently.
> > >
> > > * Use the flint backend instead of quartz:
> > > http://wiki.xapian.org/FlintBackend
> > > Don't be put off by the warning - the current state very stable
> > > (sufficiently good that I'm contemplating forking off a copy as
> > > the default backend for Xapian 1.0.)
> > >
> > > * Make sure the machine has plenty of RAM and fast disks.
> > >
> > > * Run several indexers into separate databases and merge these later
> > > with xapian-compact (for flint) or quartzcompact (for quartz). The
> > > indexing rate drops off gradually as database size grows, so the
> > > fastest way to build a large database is to build a number of
> > > databases and merge - gmane builds databases containing 1 million
> > > documents each and then merges them together. I chose this threshold
> > > after doing a bit of profiling so it's a good starting value, but you
> > > may be able to tune it further and it'll depend on your hardware too.
> > >
> > > * If you aren't trying to read from the databases while building
> > > them, you could try enabling "dangerous mode" - for flint you
> > > just need to uncomment the obvious #define in
> > > backends/flint/flint_table.cc (search for DANGEROUS) and recompile.
> > > "Dangerous" mode updates blocks in place rather than ensuring the
> > > old version is preserved, so reading while writing won't work, and
> > > (this is the "danger" bit) if the power fails or the system crashes
> > > your database may not be in a consistent state. But it reduces the
> > > amount of I/O and buys you a little speed. I use this mode to build
> > > gmane's database.
> > >
> > > I'm also have plans for a number of improvements, which I'm working on
> > > in an on-going fashion. If you're in a hurry and have a budget for
> > > your project, then funding is always welcome and would enable me to
> > > devote more time to this work!
> > >
> > > Cheers,
> > > Olly
> >
> > >
> >
> >
> > --
> > ___________________________________________________
> > Play 100s of games for FREE! http://games.mail.com/
> >
> >
> > _______________________________________________
> > Xapian-discuss mailing list
> > Xapian-discuss at lists.xapian.org
> > http://lists.xapian.org/mailman/listinfo/xapian-discuss
>
> >
>
>
> --
> ___________________________________________________
> Play 100s of games for FREE! http://games.mail.com/
>
>
> _______________________________________________
> Xapian-discuss mailing list
> Xapian-discuss at lists.xapian.org
> http://lists.xapian.org/mailman/listinfo/xapian-discuss
>
--
___________________________________________________
Play 100s of games for FREE! http://games.mail.com/
More information about the Xapian-discuss
mailing list