[Xapian-tickets] [Xapian] #324: A Script that users OpenOffice to filter text for Xapian Omega

Xapian nobody at xapian.org
Mon Feb 2 21:38:19 GMT 2009


#324: A Script that users OpenOffice to filter text for Xapian Omega
-------------------------+--------------------------------------------------
 Reporter:  frankjb      |       Owner:  olly
     Type:  enhancement  |      Status:  new 
 Priority:  normal       |   Milestone:      
Component:  Examples     |     Version:      
 Severity:  normal       |   Blockedby:      
 Platform:  Linux        |    Blocking:      
-------------------------+--------------------------------------------------
 This python script is an example of how to use openoffice to convert
 documents to text.  It's starts an headless version of openoffice which
 should remain running and will attempt to start a new instance if it is
 not. It also uses Unoconv which can be downloaded from
 http://dag.wieers.com/home-made/unoconv/.

 Unoconv doesn't need to be told what format it is accepting so you should
 be able to slot the script anywhere in omindex without to much hassle. For
 example I replaced antiword in omindex.cc with oOC.py (this script)
 because antiword couldn't open .doc's saved via Word Perfect

 I would love to get some high end stability and performance testing using
 OpenOffice as a filter.  I couldn't figure out how to get python to
 correctly marshal the soffice process hence I parsed the output of ps
 command. Maybe one of the python guru's could have a look :)

-- 
Ticket URL: <http://trac.xapian.org/ticket/324>
Xapian <http://xapian.org/>
Xapian



More information about the Xapian-tickets mailing list