[Xapian-discuss] Python bindings not freeing memory during indexing

Richard Boulton richard at lemurconsulting.com
Sat Jul 7 18:23:16 BST 2007


EJ Johnson wrote:
> Hi list,
> 
> I'm new to Xapian (great stuff!!) but am running into a problem that I haven't seen explicitly mentioned on the list before.
> 
> I'm using the Python bindings for Xapian 1.0.1 on Ubuntu Dapper 6.06 LTS using xapian.org as my repository.  My hardware is an HP DL385 G2, two dual-core AMD Opterons with 8G RAM.
> 
> I'm trying to index a good chuck of documents and have a python indexer iterating through the docs and adding them to the DB.  I get up to about 45,000 docs and it croaks.  Sometimes it throws some malloc error and the last time it just segfaulted.  Essentially, the indexer process continues to use more and more RAM until it dies.  It really only makes it up to about 3G of RAM before dying and it never hits swap.

That's odd: I'd expect it to be able to get up past the amount of 
physical memory before being killed off: have you been able to determine 
why is dies?  ie, is there an OOM killer running, or is it due to an 
internal error?

I've indexed some fairly large datasets with Xapian 1.0.1 using the 
Python bindings (around 20Gb databases), with no problems like this.

Which version of Python are you using?  I wonder if the problem could be 
python, rather than Xapian: it's fairly easy to fail to delete objects 
in python, and if there was a memory leak there, that could be the cause 
of the problem.  Or maybe something in Python is

If you ant to send your indexing script, I'll take a brief look at it 
and see if there's anything obvious wrong (probably send it direct to 
me, since the list won't accept attachments).  I've also got a copy of 
valgrind set up to run python programs, so if you send me a couple of 
documents of sample data, I can try that out.


> So, you can see that the number of docs, disk space, doc length, etc are basically the same.

Well, actually the average document length is quite significantly 
smaller in the first set of log entries; something odd is definitely 
going on there.

> My next step was to recompile Xapian and the Python bindings from source (1.0.2) is out now and see if that helps.  Any other thoughts or suggestions are greatly appreciated!

There are updated packages available in the xapian.org/debian 
repository, too, if you want to try those.

-- 
Richard



More information about the Xapian-discuss mailing list