[Xapian-discuss] Tika memory problems. Omindex restrictions?

Charles xapian at catcons.co.uk
Mon Jun 20 04:41:27 BST 2011


On 20/06/11 07:08, Olly Betts wrote:
> If you get this working, it would be useful to know how to configure
> omindex to use Tika - is it just a matter of specifying the command you
> mentioned in a previous mail with --filter?
Hello :-)

The other matters will take a while to investigate so I will answer this 
question now.  It was working on the test system at first and is working 
in production (the production server has 4 GB memory).  The omindex 
command used is:

*omindex --db /var/lib/omega/data/docoll/ \**
--filter 'application/msword:java -jar /opt/apache/tika/tika-app-0.9.jar 
--text' \
--filter 'application/octet-stream:strings -n8' \
--filter 'application/pdf:java -jar /opt/apache/tika/tika-app-0.9.jar 
--text' \
--filter 'application/vnd.ms-excel:java -jar 
/opt/apache/tika/tika-app-0.9.jar --text' \
--filter 'application/vnd.ms-powerpoint:java -jar 
/opt/apache/tika/tika-app-0.9.jar --text' \
--filter 'application/x-gzip:java -jar /opt/apache/tika/tika-app-0.9.jar 
--text' \
--filter 'application/xml:java -jar /opt/apache/tika/tika-app-0.9.jar 
--text' \
--filter 'application/x-rar:java -jar /opt/apache/tika/tika-app-0.9.jar 
--text' \
--filter 'application/x-rar:java -jar /opt/apache/tika/tika-app-0.9.jar 
--text' \
--filter 'application/x-zip:java -jar /opt/apache/tika/tika-app-0.9.jar 
--text' \
--filter 'text/plain:cat' \
--filter 'text/rtf:java -jar /opt/apache/tika/tika-app-0.9.jar --text' \
--filter 'text/x-c:strings -n8' \
--filter 'text/x-c++:strings -n8' \
--stemmer=english \
--url / \
/srv/docoll/*


Best

Charles




More information about the Xapian-discuss mailing list