[Xapian-tickets] [Xapian] #383: Patch to replace antiword with abiword

Xapian nobody at xapian.org
Fri Jun 12 15:42:58 BST 2009


#383: Patch to replace antiword with abiword
-------------------------+--------------------------------------------------
 Reporter:  frankjb      |       Owner:  olly
     Type:  enhancement  |      Status:  new 
 Priority:  normal       |   Milestone:      
Component:  Other        |     Version:      
 Severity:  normal       |    Keywords:      
Blockedby:               |    Platform:  All 
 Blocking:               |  
-------------------------+--------------------------------------------------

Comment(by olly):

 Um, we've already had essentially this same discussion on the mailing list
 (and you took part!)

 http://thread.gmane.org/gmane.comp.search.xapian.general/7310/focus=7347

 To summarise, the current status is:

   * it appears antiword is unmaintained (that's not actually a huge issue
 if it does the job, though it's certainly not a positive thing).

   * openoffice could replace it, but we don't currently have a clean
 solution, and openoffice is a rather heavyweight dependency, so it would
 be better to have a lightweight default with openoffice as an option.

   * abiword could also replace it, but it's also not terribly lightweight
 (probably not as bad as openoffice though).

   * we could easily replace antiword with wvWare, but wvWare is ~5 times
 slower, which is a bit of a hit to take just to be using a more actively
 maintained extractor - I feel there needs to be a more concrete benefit to
 be gained to justify this.

   * my (admittedly limited, as I don't have many .doc examples) testing
 showed equivalently good results from antiword and wvWare (and abiword to
 be essentially identical to wvWare, as you would expect).

   * you've claimed that antiword fails to correctly extract text from some
 documents, but didn't respond to my request for examples of such
 documents, so it's hard for me to judge how serious this is for myself.  I
 haven't seen such reports from anyone else, but perhaps nobody else has
 looked at antiword's output in detail.  It's hard for me to tell with the
 information I currently have...

-- 
Ticket URL: <http://trac.xapian.org/ticket/383#comment:2>
Xapian <http://xapian.org/>
Xapian



More information about the Xapian-tickets mailing list