[Xapian-discuss] Tika 0.8 failure rates
Charles
xapian at catcons.co.uk
Tue Aug 9 16:44:20 BST 2011
Hello :-)
FYI, here is a list of apparent Tika 0.8 conversion failures when run
from Xapian's omindex on a Debian 6 Squeeze 64-bit system with 4 GB memory:
doc files: tried: 10268, failed: 345 3.35%
docx files: tried: 248, failed: 0
odp files: tried: 7, failed: 0
ods files: tried: 71, failed: 0
odt files: tried: 136, failed: 0
pdf files: tried: 3888, failed: 150 3.85%
pps files: tried: 29, failed: 3 10.34%
ppsx files: tried: 12, failed: 0
ppt files: tried: 331, failed: 0
pptx files: tried: 24, failed: 0
rtf files: tried: 698, failed: 1 .14%
xls files: tried: 3339, failed: 2 .05%
xlsx files: tried: 63, failed: 0
The statistics were generated by searching omindex output for
.$ext" failed
where $ext was each of the listed extensions in turn.
More information can be supplied on request.
Best
Charles
More information about the Xapian-discuss
mailing list