[Xapian-tickets] [Xapian] #324: A Script that users OpenOffice to filter text for Xapian Omega

Xapian nobody at xapian.org
Fri Jun 12 05:33:53 BST 2009


#324: A Script that users OpenOffice to filter text for Xapian Omega
-------------------------+--------------------------------------------------
 Reporter:  frankjb      |       Owner:  olly                
     Type:  enhancement  |      Status:  new                 
 Priority:  normal       |   Milestone:                      
Component:  Omega        |     Version:                      
 Severity:  normal       |    Keywords:  open office  convert
Blockedby:               |    Platform:  Linux               
 Blocking:               |  
-------------------------+--------------------------------------------------
Changes (by frankjb):

  * keywords:  => open office  convert


Comment:

 Found a macro and created a bash script that seems to work ok as a proof
 of concept.
 It doesn't require openoffice to be running in the background like my
 previous script.

 At this stage the macro is converting pdf,pptp,pps,doc,xls to html as
 OpenOffice supports HTML export for almost everything while only certain
 documents can be exported to text.

 I guess we can use Xapian's method for parsing HTML in conjunction with
 this, although what's in the bash script is prob better off converting to
 C (which I'm not any good at).

 I've commented the bash script so it should be easy enough to follow.

-- 
Ticket URL: <http://trac.xapian.org/ticket/324#comment:2>
Xapian <http://xapian.org/>
Xapian



More information about the Xapian-tickets mailing list