[Xapian-discuss] Improving indexing speed

Robert Kaye rob at eorbit.net
Fri Jun 27 00:33:25 BST 2008


On Jun 26, 2008, at 12:33 PM, Richard Boulton wrote:
> Yes - you can control the number of documents Xapian batches  
> together during an indexing session using the XAPIAN_FLUSH_THRESHOLD  
> environment variable, which controls the number of document changes  
> to buffer.  The default is to buffer changes to 10000 documents in  
> memory, and then apply them to disk.

This helps quite a bit -- thanks for the tip! (My number is much  
larger since our documents are tiny, but that's just a matter of  
experimentation).

> Another approach, if your index is large, is to build several small  
> indexes, and then merge them together with "xapian-compact".   
> (Probably with the "-m" option to do multipass merging, if you end  
> up with _lots_ of small indexes.)  This method is a bit clunky, but  
> can build large indexes much faster than doing it in one go.  At  
> some point, we'll probably merge xapian-compact into the main API,  
> but for now it's only available as a standalone executable.

I am going to take this route -- I can see the disk usage creeping up  
once it gets past 20% of my index and the rows/second starts degrading  
past this point. Besides, dividing this task into chunks lets me  
offload the process to multiple cores in my machines and then glue  
things together at the end.

Thanks for the tips!

--

--ruaok      Somewhere in Texas a village is *still* missing its idiot.

Robert Kaye     --     rob at eorbit.net     --    http://mayhem-chaos.net




More information about the Xapian-discuss mailing list