[Xapian-discuss] Tika 0.8 failure rates

Charles xapian at catcons.co.uk
Tue Aug 9 16:44:20 BST 2011


Hello :-)

FYI, here is a list of apparent Tika 0.8 conversion failures when run
from Xapian's omindex on a Debian 6 Squeeze 64-bit system with 4 GB memory:

 doc files: tried: 10268, failed: 345  3.35%
docx files: tried:   248, failed:   0
 odp files: tried:     7, failed:   0
 ods files: tried:    71, failed:   0
 odt files: tried:   136, failed:   0
 pdf files: tried:  3888, failed: 150  3.85%
 pps files: tried:    29, failed:   3 10.34%
ppsx files: tried:    12, failed:   0
 ppt files: tried:   331, failed:   0
pptx files: tried:    24, failed:   0
 rtf files: tried:   698, failed:   1   .14%
 xls files: tried:  3339, failed:   2   .05%
xlsx files: tried:    63, failed:   0

The statistics were generated by searching omindex output for
.$ext" failed
where $ext was each of the listed extensions in turn.

More information can be supplied on request.

Best

Charles





More information about the Xapian-discuss mailing list