[Xapian-tickets] [Xapian] #383: Patch to replace antiword with abiword
Xapian
nobody at xapian.org
Fri Jun 12 15:42:58 BST 2009
#383: Patch to replace antiword with abiword
-------------------------+--------------------------------------------------
Reporter: frankjb | Owner: olly
Type: enhancement | Status: new
Priority: normal | Milestone:
Component: Other | Version:
Severity: normal | Keywords:
Blockedby: | Platform: All
Blocking: |
-------------------------+--------------------------------------------------
Comment(by olly):
Um, we've already had essentially this same discussion on the mailing list
(and you took part!)
http://thread.gmane.org/gmane.comp.search.xapian.general/7310/focus=7347
To summarise, the current status is:
* it appears antiword is unmaintained (that's not actually a huge issue
if it does the job, though it's certainly not a positive thing).
* openoffice could replace it, but we don't currently have a clean
solution, and openoffice is a rather heavyweight dependency, so it would
be better to have a lightweight default with openoffice as an option.
* abiword could also replace it, but it's also not terribly lightweight
(probably not as bad as openoffice though).
* we could easily replace antiword with wvWare, but wvWare is ~5 times
slower, which is a bit of a hit to take just to be using a more actively
maintained extractor - I feel there needs to be a more concrete benefit to
be gained to justify this.
* my (admittedly limited, as I don't have many .doc examples) testing
showed equivalently good results from antiword and wvWare (and abiword to
be essentially identical to wvWare, as you would expect).
* you've claimed that antiword fails to correctly extract text from some
documents, but didn't respond to my request for examples of such
documents, so it's hard for me to judge how serious this is for myself. I
haven't seen such reports from anyone else, but perhaps nobody else has
looked at antiword's output in detail. It's hard for me to tell with the
information I currently have...
--
Ticket URL: <http://trac.xapian.org/ticket/383#comment:2>
Xapian <http://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list