[Xapian-discuss] Potential memory leak when assigning MSetItem values

Jeff Rand jeffreyrand at gmail.com
Wed Jul 3 20:59:21 BST 2013


I've traced a memory leak to a statement which assigns the values from an
MSetItem to a dictionary which is then appended to a list in python. We're
running python 2.7.3, xapian-core 1.2.15 and xapian-bindings 1.2.15. I've
provided an example which reproduces the behavior below. The example prints
the PID and has a few statements waiting for input to make observing the
behavior easier.

Run the following code and monitor the PID's memory usage in top or a
similar program. I've observed the resident memory for this example go from
18m to 52m after deleting objects and running garbage collection.

I think the MSetItems are preserved in memory and are not being garbage
collected correctly, possibly from a lingering reference to the MSet or
MSetIterator.


import os
import simplejson as json
import xapian as x
import shutil
import gc

def make_db(path, num_docs=100000):
    try:
        shutil.rmtree(path)
    except OSError, e:
        if e.errno != 2:
            raise

    db = x.WritableDatabase(path, x.DB_CREATE)
    for i in xrange(1, num_docs):
        doc = x.Document()
        doc.set_data(json.dumps({ 'id': i, 'enabled': True }))
        doc.add_term('XTYPA')
        db.add_document(doc)
    return db

def run_query(db, num_docs=100000):
    e = x.Enquire(db)
    e.set_query(x.Query('XTYPA'))
    m = e.get_mset(0, num_docs, True, None)

    # Store the MSetItem's data, which causes a memory leak
    data = []
    for i in m:
        data.append({ 'data': i.document.get_data(), 'id': i.docid, })

    # Make sure I'm not crazy
    del num_docs, db, i, e, m, data
    gc.collect()

def main():
    # print the PID to monitor
    print 'PID to monitor: {}'.format(os.getpid())

    db = make_db('/tmp/test.db')
    raw_input("database is done, ready?")

    run_query(db, 100000)
    raw_input('done?')

if __name__ == '__main__':
    main()


More information about the Xapian-discuss mailing list