[Xapian-devel] PPT text extracter

jf at dockes.org jf at dockes.org
Thu Dec 12 14:11:29 GMT 2013


I've had a heads up from a user that catppt did not work at all on
semi-recent PowerPoint files (ppt, not pptx). I checked, and indeed it
misses most of the content on many files.

After looking around, I found Python code from the libreoffice project
which makes a nice ppt text extractor after adding a very thin command line


It's pure python, no other dependancies, orders of magnitude faster than
unoconv, and contrarily to catppt, does extract the text...

Just in case this can be useful to Omega... I can provide more details of



More information about the Xapian-devel mailing list