[Xapian-discuss] Tika memory problems. Omindex restrictions?

Charles xapian at catcons.co.uk
Sun Jun 19 18:21:40 BST 2011


On 18/06/11 21:39, Charles wrote:
> Thanks Olly -- that was quick!  :-)
>
> It doesn't look as if Tika is using > 1 GB memory.  Here's vmstat 
> output when running a Tika command that failed when run by omindex, 
> running it directly  at a command prompt.  The command was java -jar 
> /opt/apache/tika/apache-tika-0.8-src/tika-app/target/tika-app-0.8.jar 
> --text <whatever>.doc.  The .doc file was ~4 MB:
>
> procs -----------memory---------- ---swap-- -----io---- -system-- 
> ----cpu----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us 
> sy id wa
>
>  0  0      0 762788    312 103696    0    0     0     0   55   60  0  
> 1 99  0
>  0  0      0 762788    312 103696    0    0     0     0   85   98  1  
> 0 99  0
>  0  0      0 762788    312 103696    0    0     0     0   34   37  0  
> 0 100  0
>  1  1      0 755276    312 110404    0    0  6696     4  256  221  2  
> 5 69 24
>  1  0      0 744872    312 116084    0    0  5700     0  659  635  6 
> 18 53 23
>  1  0      0 734268    312 119868    0    0  3784     0  688  618 17 
> 16 42 25
>  3  0      0 723720    312 122416    0    0  2560     0  586  322 47 
> 12 30 11
>  3  0      0 708592    312 122476    0    0    40     0  689  352 81  
> 7 12  0
>  1  0      0 702956    312 126272    0    0  3840     0  620  464 20 
> 16 41 24
>  3  0      0 698692    312 127560    0    0  1300     0  601  318 80  
> 8 12  1
>  0  0      0 735476    312 130472    0    0  2856     0  525  459 35 
> 13 51  1
>  0  0      0 735476    312 130472    0    0     0     0   39   42  0  
> 0 100  0
>  0  0      0 735476    312 130472    0    0     0     0   32   36  0  
> 0 100  0
>  0  0      0 735476    312 130472    0    0     0     0   31   37  0  
> 0 100  0
>
> Sorry for not giving the Xapian+Omega version; it is 1.2.5.
>
> Best
>
> Charles
An update: further testing showed symptom variability which casts doubt 
on the value of my earlier testing of single Tika commands run 
independently of omindex.

For the record (I don't see how the mailing list can help with this) ...

I installed Tika 0.9 and tried again with the VirtualBox VM memory at 
1024 MB.

Ran omindex several times, running "*rm /var/lib/omega/data/docoll/*" 
before each run.  The output varied, including zero to two messages from 
glibc such as these samples gathered over around 20 runs:

*** glibc detected *** java: double free or corruption (!prev): 
0x0000000000642b40 ***
*** glibc detected *** java: free(): invalid pointer: 0x000000000242e460 ***
*** glibc detected *** java: double free or corruption (!prev): 
0x0000000001697b40 ***
*** glibc detected *** java: double free or corruption (fasttop): 
0x0000000000b33d50 ***
*** glibc detected *** java: double free or corruption (!prev): 
0x0000000000f03b30 ***
*** glibc detected *** java: free(): invalid pointer: 0x0000000000dfe440 ***

The percentages of failures for each file type also varied, mostly 
staying the same (0 or 100%) like this:

  doc files: tried: 134, failed: 76 100.00%%
docx files: tried:   1, failed:  0
  odp files: tried:   1, failed:  0
  ods files: tried:  23, failed:  0
  odt files: tried:  71, failed:  0
  pdf files: tried:  81, failed: 81 100.00%
  ppt files: tried:   4, failed:  4 100.00%
  rtf files: tried:   2, failed:  2 100.00%
  xls files: tried:  27, failed: 27 100.00%

but .doc file failure rate also showed 47.76%, 56.17%, 60.44% and .xls 
once showed 3.7%

There are no problems logged in the /var/log/* files of either 
VirtualBox host or the VirtualBox VM test system.  The host has ECC 
memory and ECC reporting enabled in the kernel and userspace.  The host 
is not exhibiting any evidence of instability.  The test system is not 
being used for anything except exploring this problem.

Best

Charles

Best

Charles

*



More information about the Xapian-discuss mailing list