[Xapian-discuss] Re: Big process using Xapian

Olly Betts olly at survex.com
Tue Feb 6 07:18:42 GMT 2007


On Fri, Feb 02, 2007 at 03:54:57PM +0000, James Aylett wrote:
> On Fri, Feb 02, 2007 at 07:17:04AM -0800, Rafael SDM Sierra wrote:
> 
> > >[1] - 736M   694M biord  182:27  2.15% python
> >
> > I change from 1000 to 10000 the xapian flush threshold, and the process
> > become bigger oO...
> > 
> > 2008M  1504M swread  10:36  0.00% python
> > 
> > It's all that I have of memory (2GB), my swap is in use now...
> 
> Some systems cannot free main memory from the process back to the
> operating system, even if it is unused within the process.

Also, the GNU C++ STL implementation likes to "horde" memory it has been
allocated to avoid lots of calls to malloc() and free().  That's
generally great for speed, but makes it harder to release memory back to
the OS even where this is possible (I think it's uncommon anyway on
Unix-like platforms).

My long term plan is to buffer this information in memory allocated
outside the C/C++ heap (using anon mmap or similar) so that we can just
release it straight back to the OS once flush() or cancel() has been
called.

> The larger the flush threshold, the more data has to be held in
> memory, so this might explain what you're seeing.

Indeed.  But it should get reused by the next batch of documents being
added so you shouldn't see the process size continue to grow.  Also I
suspect that most of the now unused space can just get paged out until
another batch gets added so this shouldn't actually be a big problem.

Remember it's not a problem to be using swap per se - it's only a
problem when the working set of a process is getting swapped out.

Cheers,
    Olly



More information about the Xapian-discuss mailing list