[Xapian-discuss] Tika memory problems. Omindex restrictions?
Charles
xapian at catcons.co.uk
Mon Jun 20 04:41:27 BST 2011
On 20/06/11 07:08, Olly Betts wrote:
> If you get this working, it would be useful to know how to configure
> omindex to use Tika - is it just a matter of specifying the command you
> mentioned in a previous mail with --filter?
Hello :-)
The other matters will take a while to investigate so I will answer this
question now. It was working on the test system at first and is working
in production (the production server has 4 GB memory). The omindex
command used is:
*omindex --db /var/lib/omega/data/docoll/ \**
--filter 'application/msword:java -jar /opt/apache/tika/tika-app-0.9.jar
--text' \
--filter 'application/octet-stream:strings -n8' \
--filter 'application/pdf:java -jar /opt/apache/tika/tika-app-0.9.jar
--text' \
--filter 'application/vnd.ms-excel:java -jar
/opt/apache/tika/tika-app-0.9.jar --text' \
--filter 'application/vnd.ms-powerpoint:java -jar
/opt/apache/tika/tika-app-0.9.jar --text' \
--filter 'application/x-gzip:java -jar /opt/apache/tika/tika-app-0.9.jar
--text' \
--filter 'application/xml:java -jar /opt/apache/tika/tika-app-0.9.jar
--text' \
--filter 'application/x-rar:java -jar /opt/apache/tika/tika-app-0.9.jar
--text' \
--filter 'application/x-rar:java -jar /opt/apache/tika/tika-app-0.9.jar
--text' \
--filter 'application/x-zip:java -jar /opt/apache/tika/tika-app-0.9.jar
--text' \
--filter 'text/plain:cat' \
--filter 'text/rtf:java -jar /opt/apache/tika/tika-app-0.9.jar --text' \
--filter 'text/x-c:strings -n8' \
--filter 'text/x-c++:strings -n8' \
--stemmer=english \
--url / \
/srv/docoll/*
Best
Charles
More information about the Xapian-discuss
mailing list