[Xapian-discuss] Improving indexing speed

Robert Kaye rob at eorbit.net
Tue Jul 1 07:07:19 BST 2008

On Jun 30, 2008, at 10:23 PM, Olly Betts wrote:

> If you're I/O limited (which is usually the case), then trying to  
> split
> the load over multiple cores by indexing in parallel probably won't
> help.  It may make things slower overall, as it will tend to increase
> the VM pressure, and also tend to mean disk writes will be split  
> between
> more files.
> I'd also be a bit wary of the idea of trying to use a ram disk to hold
> the index.  Depending how your OS's VM system works, this might mean
> you end up trying to hold two copies of the index in RAM - one in  
> the RAM
> disk, plus a cached copy in the file cache.  Or perhaps the VM system
> knows about RAM disks and is smart enough not to try to cache blocks
> from them, but it's something you ought to check.

I've been watching the performance of xapian indexing in the past few  
days and I would concur with your assessment now.

However, given a sufficiently beefy machine, I think I could use  
multiple cores to get this job done in a hurry. Given the memory use/ 
disk IO trade-off I've observed, I think I could have my indexing  
machine, run 2-4 indexers if I dedicated all 8G of RAM to the task. :)  
Performance improves drastically when you give Xapian 500-700MB of RAM  
to play with. About 1.5G per process would probably result in a well  
loaded machine -- I'm guessing.

Would anyone be interested in the results of this? I've got these 8  
core/8G machines sitting in my office, waiting to go to the colo and I  
could try it out. (Not that I would do this in a production  
environment -- I'm not that pressed for time)


--ruaok      Somewhere in Texas a village is *still* missing its idiot.

Robert Kaye     --     rob at eorbit.net     --    http://mayhem-chaos.net

More information about the Xapian-discuss mailing list