[Xapian-tickets] [Xapian] #637: Potential memory leak when assigning MSetItem values
Xapian
nobody at xapian.org
Mon Mar 17 22:02:25 GMT 2014
#637: Potential memory leak when assigning MSetItem values
--------------------------------------+----------------------------
Reporter: jeffrand | Owner: richard
Type: defect | Status: new
Priority: normal | Milestone: 1.2.x
Component: Xapian-bindings (Python) | Version: 1.2.15
Severity: normal | Resolution:
Keywords: Memory leak | Blocked By:
Blocking: | Operating System: Linux
--------------------------------------+----------------------------
\
\
\
\
\
\
Old description:
> I've traced a memory leak to a statement which assigns the values from an
> MSetItem to a dictionary which is then appended to a list in python.
> We're running python 2.7.3, xapian-core 1.2.15 and xapian-bindings
> 1.2.15. I've provided an example which reproduces the behavior below. The
> example prints the PID and has a few statements waiting for input to make
> observing the behavior easier.
>
> Run the following code and monitor the PID's memory usage in top or a
> similar program. I've observed the resident memory for this example go
> from 18m to 52m after deleting objects and running garbage collection.
>
> I think the MSetItems are preserved in memory and are not being garbage
> collected correctly, possibly from a lingering reference to the MSet or
> MSetIterator.
>
> import os
> import simplejson as json
> import xapian as x
> import shutil
> import gc
>
> def make_db(path, num_docs=100000):
> try:
> shutil.rmtree(path)
> except OSError, e:
> if e.errno != 2:
> raise
>
> db = x.WritableDatabase(path, x.DB_CREATE)
> for i in xrange(1, num_docs):
> doc = x.Document()
> doc.set_data(json.dumps({ 'id': i, 'enabled': True }))
> doc.add_term('XTYPA')
> db.add_document(doc)
> return db
>
> def run_query(db, num_docs=100000):
> e = x.Enquire(db)
> e.set_query(x.Query('XTYPA'))
> m = e.get_mset(0, num_docs, True, None)
>
> # Store the MSetItem's data, which causes a memory leak
> data = []
> for i in m:
> data.append({ 'data': i.document.get_data(), 'id': i.docid, })
>
> # Make sure I'm not crazy
> del num_docs, db, i, e, m, data
> gc.collect()
>
> def main():
> # print the PID to monitor
> print 'PID to monitor: {}'.format(os.getpid())
>
> db = make_db('/tmp/test.db')
> raw_input("database is done, ready?")
>
> run_query(db, 100000)
> raw_input('done?')
>
> if __name__ == '__main__':
> main()
New description:
I've traced a memory leak to a statement which assigns the values from an
MSetItem to a dictionary which is then appended to a list in python. We're
running python 2.7.3, xapian-core 1.2.15 and xapian-bindings 1.2.15. I've
provided an example which reproduces the behavior below. The example
prints the PID and has a few statements waiting for input to make
observing the behavior easier.
Run the following code and monitor the PID's memory usage in top or a
similar program. I've observed the resident memory for this example go
from 18m to 52m after deleting objects and running garbage collection.
I think the MSetItems are preserved in memory and are not being garbage
collected correctly, possibly from a lingering reference to the MSet or
MSetIterator.
{{{
#!python
import os
import simplejson as json
import xapian as x
import shutil
import gc
def make_db(path, num_docs=100000):
try:
shutil.rmtree(path)
except OSError, e:
if e.errno != 2:
raise
db = x.WritableDatabase(path, x.DB_CREATE)
for i in xrange(1, num_docs):
doc = x.Document()
doc.set_data(json.dumps({ 'id': i, 'enabled': True }))
doc.add_term('XTYPA')
db.add_document(doc)
return db
def run_query(db, num_docs=100000):
e = x.Enquire(db)
e.set_query(x.Query('XTYPA'))
m = e.get_mset(0, num_docs, True, None)
# Store the MSetItem's data, which causes a memory leak
data = []
for i in m:
data.append({ 'data': i.document.get_data(), 'id': i.docid, })
# Make sure I'm not crazy
del num_docs, db, i, e, m, data
gc.collect()
def main():
# print the PID to monitor
print 'PID to monitor: {}'.format(os.getpid())
db = make_db('/tmp/test.db')
raw_input("database is done, ready?")
run_query(db, 100000)
raw_input('done?')
if __name__ == '__main__':
main()
}}}
--
\
\
Comment (by olly):
If you ask the python gc module now many objects are allocated, it doesn't
increase. The attached slightly modified version of your script shows
this (note calling {{{gc.collect()}}} more than once sometimes seems to be
necessary to actually collect all objects - not sure why).
On trunk:
{{{
$ ./run-python-test ticket637.py
PID to monitor: 4107
database is done, ready?
num objects before = 7519
num objects after = 7519
done?
$
}}}
And HEAD of 1.2 branch:
{{{
$ PYTHONPATH=. python ticket637.py
PID to monitor: 972
database is done, ready?
num objects before = 7115
num objects after = 7115
done?
}}}
So I don't see how this can be Python hanging on to objects.
I think this is just due to C++'s allocator hanging on to memory. As I
said in my reply to the mailing list, this memory should just get reused
by later operations (like the next query you run).
\
\
\
--
Ticket URL: <http://trac.xapian.org/ticket/637#comment:1>
Xapian <http://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list