[Xapian-discuss] Improving indexing speed
Robert Kaye
rob at eorbit.net
Tue Jul 1 07:07:19 BST 2008
On Jun 30, 2008, at 10:23 PM, Olly Betts wrote:
> If you're I/O limited (which is usually the case), then trying to
> split
> the load over multiple cores by indexing in parallel probably won't
> help. It may make things slower overall, as it will tend to increase
> the VM pressure, and also tend to mean disk writes will be split
> between
> more files.
>
> I'd also be a bit wary of the idea of trying to use a ram disk to hold
> the index. Depending how your OS's VM system works, this might mean
> you end up trying to hold two copies of the index in RAM - one in
> the RAM
> disk, plus a cached copy in the file cache. Or perhaps the VM system
> knows about RAM disks and is smart enough not to try to cache blocks
> from them, but it's something you ought to check.
I've been watching the performance of xapian indexing in the past few
days and I would concur with your assessment now.
However, given a sufficiently beefy machine, I think I could use
multiple cores to get this job done in a hurry. Given the memory use/
disk IO trade-off I've observed, I think I could have my indexing
machine, run 2-4 indexers if I dedicated all 8G of RAM to the task. :)
Performance improves drastically when you give Xapian 500-700MB of RAM
to play with. About 1.5G per process would probably result in a well
loaded machine -- I'm guessing.
Would anyone be interested in the results of this? I've got these 8
core/8G machines sitting in my office, waiting to go to the colo and I
could try it out. (Not that I would do this in a production
environment -- I'm not that pressed for time)
--
--ruaok Somewhere in Texas a village is *still* missing its idiot.
Robert Kaye -- rob at eorbit.net -- http://mayhem-chaos.net
More information about the Xapian-discuss
mailing list