[Xapian-discuss] Python bindings not freeing memory during
indexing
EJ Johnson
ej.johnson at rackspace.com
Mon Jul 9 02:45:05 BST 2007
From: Richard Boulton [mailto:richard at lemurconsulting.com]
>> EJ Johnson wrote:
>> As far as the errors, the first set of errors I was seeing was a failed
>> call to st95_malloc (or something like that) that seemed to be thrown
>> from the SWIG bindings. I'd have to re-run my older code and let it run
>> for 120minutes to regenerate the error.
> Have you re-tried with the 1.0.2 release? The best way forward, I
> think, is to re-run the test, with the workarounds removed so that the
> error occurs (if it still does with 1.0.2), and to keep the full output
> so we can pick through it.
The 1.0.2 release isn't doesn't appear to be available for Dapper, so
I downloaded the source and compiled xapian-core and xapian-bindings
for 1.0.2. I loaded the python module and verified by printing out
xapian.xapian_version_string().
I'm getting the same error from my indexer without my workarounds in
place. Here's my run-time log and the traceback it throws.
=> indexing 526 more tickets (total:94922)
File "./ticketloader.py", line 146, in ?
docid = db.add_document(doc)
RuntimeError: St9bad_alloc
real 112m27.015s
user 53m0.960s
sys 1m47.350s
-------------------------------------
Here's another snippet from my du, top, and delve log right before it
threw the RuntimeError:
==================================================================
=> Space on disk: 926M xapdb
=> 19096 ej.johns 17 0 3066m 3.0g 3780 R 100 37.3 54:33.07 ticketloader.py
=> Number of documents: 94000
=> Highest doc number: 94000
=> Average doc length: 670.029776596
==================================================================
=> Space on disk: 931M xapdb
=> 19096 ej.johns 25 0 3068m 3.0g 3900 R 100 37.3 54:40.98 ticketloader.py
=> Number of documents: 95000
=> Highest doc number: 95000
=> Average doc length: 669.278726316
One point of interest is that for my hand-compiled 1.0.2 version,
it honored the XAPIAN_FLUSH_THRESHOLD environment variable (which I
had set to 1000 for the run that produced these errors. In the 1.0.1
release for Dapper from xapian.org, it would ignore the env var
which led to one of my workarounds to call flush() on my own.
> Looking into why there are different document sizes with and without
> your workaround would be useful, too. You could do this by comparing
> some random documents with the delve tool in the indexes built with and
> without the workaround.
I'll kick off another round with a workaround and let you know the
differences between the docs.
> One other question - is the process you're running the indexer in using
> multiple (python) threads? There was a bug in 1.0.1 which could have
> caused corruption in this case - this is fixed in 1.0.2.
Negative, I'm not using threads. Although I'm glad to hear that it's
possible now. :-)
Thanks again,
Eric
Confidentiality Notice: This e-mail message (including any attached or
embedded documents) is intended for the exclusive and confidential use of the
individual or entity to which this message is addressed, and unless otherwise
expressly indicated, is confidential and privileged information of Rackspace
Managed Hosting. Any dissemination, distribution or copying of the enclosed
material is prohibited. If you receive this transmission in error, please
notify us immediately by e-mail at abuse at rackspace.com, and delete the
original message. Your cooperation is appreciated.
More information about the Xapian-discuss
mailing list