[Xapian-discuss] Tika memory problems. Omindex restrictions?
Charles
xapian at catcons.co.uk
Sun Jun 19 18:21:40 BST 2011
On 18/06/11 21:39, Charles wrote:
> Thanks Olly -- that was quick! :-)
>
> It doesn't look as if Tika is using > 1 GB memory. Here's vmstat
> output when running a Tika command that failed when run by omindex,
> running it directly at a command prompt. The command was java -jar
> /opt/apache/tika/apache-tika-0.8-src/tika-app/target/tika-app-0.8.jar
> --text <whatever>.doc. The .doc file was ~4 MB:
>
> procs -----------memory---------- ---swap-- -----io---- -system--
> ----cpu----
> r b swpd free buff cache si so bi bo in cs us
> sy id wa
>
> 0 0 0 762788 312 103696 0 0 0 0 55 60 0
> 1 99 0
> 0 0 0 762788 312 103696 0 0 0 0 85 98 1
> 0 99 0
> 0 0 0 762788 312 103696 0 0 0 0 34 37 0
> 0 100 0
> 1 1 0 755276 312 110404 0 0 6696 4 256 221 2
> 5 69 24
> 1 0 0 744872 312 116084 0 0 5700 0 659 635 6
> 18 53 23
> 1 0 0 734268 312 119868 0 0 3784 0 688 618 17
> 16 42 25
> 3 0 0 723720 312 122416 0 0 2560 0 586 322 47
> 12 30 11
> 3 0 0 708592 312 122476 0 0 40 0 689 352 81
> 7 12 0
> 1 0 0 702956 312 126272 0 0 3840 0 620 464 20
> 16 41 24
> 3 0 0 698692 312 127560 0 0 1300 0 601 318 80
> 8 12 1
> 0 0 0 735476 312 130472 0 0 2856 0 525 459 35
> 13 51 1
> 0 0 0 735476 312 130472 0 0 0 0 39 42 0
> 0 100 0
> 0 0 0 735476 312 130472 0 0 0 0 32 36 0
> 0 100 0
> 0 0 0 735476 312 130472 0 0 0 0 31 37 0
> 0 100 0
>
> Sorry for not giving the Xapian+Omega version; it is 1.2.5.
>
> Best
>
> Charles
An update: further testing showed symptom variability which casts doubt
on the value of my earlier testing of single Tika commands run
independently of omindex.
For the record (I don't see how the mailing list can help with this) ...
I installed Tika 0.9 and tried again with the VirtualBox VM memory at
1024 MB.
Ran omindex several times, running "*rm /var/lib/omega/data/docoll/*"
before each run. The output varied, including zero to two messages from
glibc such as these samples gathered over around 20 runs:
*** glibc detected *** java: double free or corruption (!prev):
0x0000000000642b40 ***
*** glibc detected *** java: free(): invalid pointer: 0x000000000242e460 ***
*** glibc detected *** java: double free or corruption (!prev):
0x0000000001697b40 ***
*** glibc detected *** java: double free or corruption (fasttop):
0x0000000000b33d50 ***
*** glibc detected *** java: double free or corruption (!prev):
0x0000000000f03b30 ***
*** glibc detected *** java: free(): invalid pointer: 0x0000000000dfe440 ***
The percentages of failures for each file type also varied, mostly
staying the same (0 or 100%) like this:
doc files: tried: 134, failed: 76 100.00%%
docx files: tried: 1, failed: 0
odp files: tried: 1, failed: 0
ods files: tried: 23, failed: 0
odt files: tried: 71, failed: 0
pdf files: tried: 81, failed: 81 100.00%
ppt files: tried: 4, failed: 4 100.00%
rtf files: tried: 2, failed: 2 100.00%
xls files: tried: 27, failed: 27 100.00%
but .doc file failure rate also showed 47.76%, 56.17%, 60.44% and .xls
once showed 3.7%
There are no problems logged in the /var/log/* files of either
VirtualBox host or the VirtualBox VM test system. The host has ECC
memory and ECC reporting enabled in the kernel and userspace. The host
is not exhibiting any evidence of instability. The test system is not
being used for anything except exploring this problem.
Best
Charles
Best
Charles
*
More information about the Xapian-discuss
mailing list