[Xapian-discuss] Tika memory problems. Omindex restrictions?

Olly Betts olly at survex.com
Mon Jun 20 02:38:02 BST 2011


On Sun, Jun 19, 2011 at 10:51:40PM +0530, Charles wrote:
> For the record (I don't see how the mailing list can help with this) ...

These are starting to sound like they're mostly Tika and/or Java issues,
though maybe it's an issue with how we set the resource limits, or we
might need to provide a workaround if Java doesn't handle such limits
well.

> I installed Tika 0.9 and tried again with the VirtualBox VM memory at  
> 1024 MB.
>
> Ran omindex several times, running "*rm /var/lib/omega/data/docoll/*"  
> before each run.  The output varied, including zero to two messages from  
> glibc such as these samples gathered over around 20 runs:
>
> *** glibc detected *** java: double free or corruption (!prev):  
> 0x0000000000642b40 ***
> *** glibc detected *** java: free(): invalid pointer: 0x000000000242e460 ***
> *** glibc detected *** java: double free or corruption (!prev):  
> 0x0000000001697b40 ***
> *** glibc detected *** java: double free or corruption (fasttop):  
> 0x0000000000b33d50 ***
> *** glibc detected *** java: double free or corruption (!prev):  
> 0x0000000000f03b30 ***
> *** glibc detected *** java: free(): invalid pointer: 0x0000000000dfe440 ***

Those sound like bugs in the JVM - potentially serious ones since double
free can lead to security vulnerabilities:

http://cwe.mitre.org/data/definitions/415.html

I guess it isn't handling running out of memory gracefully.

I don't know much about how JVMs set their memory limits by default (I
know you can specify on the command line), but perhaps the JVM is
looking at the limits omindex sets and basing decisions on these?

You could try disabling or changing omindex's limits - see runfilter.cc
for where the limit is set and freemem.cc for where the amount of
available memory is determined.

Or come at it from the other end and set resource limits similar to
those omindex is setting when running Tika from the shell.

If you get this working, it would be useful to know how to configure
omindex to use Tika - is it just a matter of specifying the command you
mentioned in a previous mail with --filter?

> There are no problems logged in the /var/log/* files of either  
> VirtualBox host or the VirtualBox VM test system.  The host has ECC  
> memory and ECC reporting enabled in the kernel and userspace.  The host  
> is not exhibiting any evidence of instability.  The test system is not  
> being used for anything except exploring this problem.

You could try running memtest86+ is you want further evidence, though if
ECC isn't reporting errors, it's unlikely there are any.

http://www.memtest.org/

Some distros even include it as an option in the default boot menu (e.g.
Ubuntu does IIRC).

Cheers,
    Olly



More information about the Xapian-discuss mailing list