[Xapian-discuss] Java threads
James Aylett
james-xapian at tartarus.org
Fri Nov 2 11:20:13 GMT 2007
On Thu, Nov 01, 2007 at 09:11:05PM +0000, Olly Betts wrote:
> For indexing, you can only have one WritableDatabase object for each
> database, so you probably want to have a single thread handling updates,
> and other threads passing Document objects to it to be added.
That's what I'd do. Use a fast thread-safe one-reader queue (this
allows optimisations that multi-reader queues don't), throw your fresh
Document objects at it and scoop them up into a (slower) writer thread.
> But assuming by "PC" you mean a typical desktop machine, I doubt it's
> worth multithreading indexing of files - the main bottleneck is likely
> to be reading data from the disks, and multithreading that probably
> won't help performance. Most PCs don't have a RAID for the disk
> subsystem, so interleaving file accesses will probably just send the
> disk head seeking back and forth.
If you have lots of small files and a large amount of memory you could
read things into memory out of one thread, and have multiple threads
indexing. That's only going to improve over naive single threading
where the indexing of each document takes a long time. (And your read
thread will stall regularly there.) It also requires plenty of cores
to make it worthwhile, I suspect (unless your OS can do funky IO
rescheduling of threads, which I don't think will actually help in
this case because only two of them are doing IO anyway).
There's some interesting discussion going on at the moment about how
to do things that are IO-bound in a threaded context. I don't think there
are any final conclusions yet, but this is something that smart people
are working on, and once there's consensus it should influence the
future direction of Xapian in some way :)
J
--
/--------------------------------------------------------------------------\
James Aylett xapian.org
james at tartarus.org uncertaintydivision.org
More information about the Xapian-discuss
mailing list