[Xapian-devel] PPT text extracter

jf at dockes.org jf at dockes.org
Thu Dec 12 14:11:29 GMT 2013


Hi,

I've had a heads up from a user that catppt did not work at all on
semi-recent PowerPoint files (ppt, not pptx). I checked, and indeed it
misses most of the content on many files.

After looking around, I found Python code from the libreoffice project
which makes a nice ppt text extractor after adding a very thin command line
wrapper:

  http://cgit.freedesktop.org/libreoffice/contrib/mso-dumper/

It's pure python, no other dependancies, orders of magnitude faster than
unoconv, and contrarily to catppt, does extract the text...

Just in case this can be useful to Omega... I can provide more details of
course.

Cheers,

jf





More information about the Xapian-devel mailing list