[Xapian-discuss] Tika memory problems. Omindex restrictions?
Charles
xapian at catcons.co.uk
Sat Jun 18 15:53:28 BST 2011
Hello :-)
IDK what the significant change was but running Tika from omindex
started failing on 247 out of a tree of 400 files with error message
"java.lang.OutOfMemoryError: requested <number> bytes for CHeapObj-new.
Out of swap space?". The biggest file in the tree is ~95 MB; most are
under 1 MB.
The files triggering the error had extensions doc, pdf, ppt, rtf and
xls so the problem is probably not specific to the file type.
Running vmstat with a 1 second delay during the omindex run showed no
swapping and consistently ~0.5GB (of 1 GB) free memory so the problem is
not system memory.
The bash ulimit command reported "unlimited" and
/etc/security/limits.conf is all comments or empty lines.
Omindex ran Tika OK on this development system from installation on
31mar11 until it was last used on 14apr11. All system changes are
logged but none of the changes since 14apr11 are obviously relevant.
The OS is Debian Squeeze 64 bit running in a virtual machine -- hence
the small sample of 400 files and the 1 GB memory.
Changing the VirtualBox VM memory from ~1 GB to 3072 MB fixed the
problem. Changed to 1024 MB and tried to reproduce the problem but the
behaviour had changed. The java.lang.OutOfMemoryError message no longer
appeared. Some now generated std::bad_alloc messages but most simply
"Aborted" (IDK whether that message is from omindex or Tika).
For the file types that omindex uses Tika as a filter:
doc files: tried: 134, failed: 60 44.77%
docx files: tried: 1, failed: 0
odp files: tried: 1, failed: 0
ods files: tried: 23, failed: 0
odt files: tried: 71, failed: 0
pdf files: tried: 81, failed: 81 100.00%
ppt files: tried: 4, failed: 4 100.00%
rtf files: tried: 2, failed: 2 100.00%
xls files: tried: 27, failed: 27 100.00%
Taking a sample of failing Tika commands from omindex output and running
them at the command prompt does not produce any errors. It is beginning
to look as if the problem is caused by the environment that omindex sets
up for Tika to run in. Does that make any sense?
Best
Charles
More information about the Xapian-discuss
mailing list