[Xapian-discuss] docx support
Olly Betts
olly at survex.com
Thu Jul 24 02:51:26 BST 2008
On Thu, Jul 24, 2008 at 04:08:26AM +0930, Frank Bruzzaniti wrote:
> Is office 2007 formats like docx supported?
Out of the box, not unless antiword supports it. The last update to the
debian packaged version was August 2006, so I suspect the answer is
"no".
> Is there anyway to get xapian to index office 2007 formats?
>
> Is there any option/procedure to add a new mime plugin?
> For example if you rename a docx .zip you can retrieve text from
> document.xml
I assume you mean for Omega's omindex indexer?
There isn't currently a way to configure additional filters without
modifying the source code in omindex.cc (ideally there should be a
configuration file to allow this, but it's not implemented yet), but
it's quite easy to wire in additional external filters if you aren't
scared of dabbling in C++.
Rather than writing a full guide here, I'm going to write this up as a
wiki page, since that will be easier for others to find in the future.
I'll reply again when I'm done.
Cheers,
Olly
More information about the Xapian-discuss
mailing list