[Xapian-discuss] antiword
Henry
henka at cityweb.co.za
Thu May 14 18:42:42 BST 2009
Quoting "Olly Betts" <olly at survex.com>:
> If we're going to pick a better default converter, I'd rather do so
> based on trying the various options on a set of sample documents and
> comparing the output, time taken, and memory requirements rather than
> relying on anecdotal reports that a particular option has trouble with
> "many" documents.
You may want to try wvware (http://wvware.sourceforge.net/). It's
also becoming dated, but still does a good job of converting msdoc
files. Preserves the layout better than others too (even Abiword),
I've found. This is not as important for indexing, but is for
displaying the cached (converted) version, etc.
>> I've been looking into getting openoffice to do it in headless mode but
>> still have a way to go before it's stable.
>> I was wondering if anyone else had any luck on this front?
I tried this, but crikey, it's dependency-hell and requires some
hackery to achieve headless state.
>> One quick fix I have found for word documents is by using abiword
I also got this working pretty painlessly, but it's also resource
intensive as Olly says. My concern on our cluster was to use
something as lightweight as possible, while striking a balance of file
format compatibility. That being said, it's possible that Abiword
might the best forward-looking solution...
Cheers
Henry
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: PGP Digital Signature
Url : http://lists.xapian.org/pipermail/xapian-discuss/attachments/20090514/c53bcc05/attachment.pgp
More information about the Xapian-discuss
mailing list