[Xapian-devel] PPT text extracter
jf at dockes.org
jf at dockes.org
Thu Dec 12 14:11:29 GMT 2013
Hi,
I've had a heads up from a user that catppt did not work at all on
semi-recent PowerPoint files (ppt, not pptx). I checked, and indeed it
misses most of the content on many files.
After looking around, I found Python code from the libreoffice project
which makes a nice ppt text extractor after adding a very thin command line
wrapper:
http://cgit.freedesktop.org/libreoffice/contrib/mso-dumper/
It's pure python, no other dependancies, orders of magnitude faster than
unoconv, and contrarily to catppt, does extract the text...
Just in case this can be useful to Omega... I can provide more details of
course.
Cheers,
jf
More information about the Xapian-devel
mailing list