[Xapian-tickets] [Xapian] #324: A Script that users OpenOffice to filter text for Xapian Omega
Xapian
nobody at xapian.org
Fri Jun 12 05:33:53 BST 2009
#324: A Script that users OpenOffice to filter text for Xapian Omega
-------------------------+--------------------------------------------------
Reporter: frankjb | Owner: olly
Type: enhancement | Status: new
Priority: normal | Milestone:
Component: Omega | Version:
Severity: normal | Keywords: open office convert
Blockedby: | Platform: Linux
Blocking: |
-------------------------+--------------------------------------------------
Changes (by frankjb):
* keywords: => open office convert
Comment:
Found a macro and created a bash script that seems to work ok as a proof
of concept.
It doesn't require openoffice to be running in the background like my
previous script.
At this stage the macro is converting pdf,pptp,pps,doc,xls to html as
OpenOffice supports HTML export for almost everything while only certain
documents can be exported to text.
I guess we can use Xapian's method for parsing HTML in conjunction with
this, although what's in the bash script is prob better off converting to
C (which I'm not any good at).
I've commented the bash script so it should be easy enough to follow.
--
Ticket URL: <http://trac.xapian.org/ticket/324#comment:2>
Xapian <http://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list