[Xapian-discuss] Tika memory problems. Omindex restrictions?
Charles
xapian at catcons.co.uk
Sat Jun 18 17:09:07 BST 2011
On 18/06/11 21:08, Olly Betts wrote:
> On Sat, Jun 18, 2011 at 08:23:28PM +0530, Charles wrote:
> [snip]
> To prevent issues with run-away filters, they're limited to the size of
> physical memory and 5 minutes of CPU time.
>
> If Tika's really using> 1GB of memory to extract files under 1MB, it
> seems that's going to be problematic on a system with 1GB of memory.
>
> What Xapian version are you using? Older versions of Omega based
> bug which based the limit on free memory, which on Linux excludes
> that used for caching, often leaving a very small amount of memory
> apparently free.
>
> Cheers,
> Olly
Thanks Olly -- that was quick! :-)
It doesn't look as if Tika is using > 1 GB memory. Here's vmstat output
when running a Tika command that failed when run by omindex, running it
directly at a command prompt. The command was java -jar
/opt/apache/tika/apache-tika-0.8-src/tika-app/target/tika-app-0.8.jar
--text <whatever>.doc. The .doc file was ~4 MB:
procs -----------memory---------- ---swap-- -----io---- -system--
----cpu----
r b swpd free buff cache si so bi bo in cs us sy
id wa
0 0 0 762788 312 103696 0 0 0 0 55 60 0 1
99 0
0 0 0 762788 312 103696 0 0 0 0 85 98 1 0
99 0
0 0 0 762788 312 103696 0 0 0 0 34 37 0 0
100 0
1 1 0 755276 312 110404 0 0 6696 4 256 221 2 5
69 24
1 0 0 744872 312 116084 0 0 5700 0 659 635 6 18
53 23
1 0 0 734268 312 119868 0 0 3784 0 688 618 17 16
42 25
3 0 0 723720 312 122416 0 0 2560 0 586 322 47 12
30 11
3 0 0 708592 312 122476 0 0 40 0 689 352 81 7
12 0
1 0 0 702956 312 126272 0 0 3840 0 620 464 20 16
41 24
3 0 0 698692 312 127560 0 0 1300 0 601 318 80 8
12 1
0 0 0 735476 312 130472 0 0 2856 0 525 459 35 13
51 1
0 0 0 735476 312 130472 0 0 0 0 39 42 0 0
100 0
0 0 0 735476 312 130472 0 0 0 0 32 36 0 0
100 0
0 0 0 735476 312 130472 0 0 0 0 31 37 0 0
100 0
Sorry for not giving the Xapian+Omega version; it is 1.2.5.
Best
Charles
More information about the Xapian-discuss
mailing list