[Xapian-tickets] [Xapian] #324: A Script that users OpenOffice to filter text for Xapian Omega
Xapian
nobody at xapian.org
Mon Feb 2 21:38:19 GMT 2009
#324: A Script that users OpenOffice to filter text for Xapian Omega
-------------------------+--------------------------------------------------
Reporter: frankjb | Owner: olly
Type: enhancement | Status: new
Priority: normal | Milestone:
Component: Examples | Version:
Severity: normal | Blockedby:
Platform: Linux | Blocking:
-------------------------+--------------------------------------------------
This python script is an example of how to use openoffice to convert
documents to text. It's starts an headless version of openoffice which
should remain running and will attempt to start a new instance if it is
not. It also uses Unoconv which can be downloaded from
http://dag.wieers.com/home-made/unoconv/.
Unoconv doesn't need to be told what format it is accepting so you should
be able to slot the script anywhere in omindex without to much hassle. For
example I replaced antiword in omindex.cc with oOC.py (this script)
because antiword couldn't open .doc's saved via Word Perfect
I would love to get some high end stability and performance testing using
OpenOffice as a filter. I couldn't figure out how to get python to
correctly marshal the soffice process hence I parsed the output of ps
command. Maybe one of the python guru's could have a look :)
--
Ticket URL: <http://trac.xapian.org/ticket/324>
Xapian <http://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list