[Xapian-discuss] docx support

Olly Betts olly at survex.com
Thu Jul 24 02:51:26 BST 2008


On Thu, Jul 24, 2008 at 04:08:26AM +0930, Frank Bruzzaniti wrote:
> Is office 2007 formats like docx supported?

Out of the box, not unless antiword supports it.  The last update to the
debian packaged version was August 2006, so I suspect the answer is
"no".

> Is there anyway to get xapian to index office 2007 formats?
> 
> Is there any option/procedure to add a new mime plugin?
> For example if you rename a docx .zip you can retrieve text from 
> document.xml

I assume you mean for Omega's omindex indexer?

There isn't currently a way to configure additional filters without
modifying the source code in omindex.cc (ideally there should be a
configuration file to allow this, but it's not implemented yet), but
it's quite easy to wire in additional external filters if you aren't
scared of dabbling in C++.

Rather than writing a full guide here, I'm going to write this up as a
wiki page, since that will be easier for others to find in the future.
I'll reply again when I'm done.

Cheers,
    Olly



More information about the Xapian-discuss mailing list