[Xapian-discuss] docx support
Olly Betts
olly at survex.com
Thu Jul 24 03:53:04 BST 2008
On Thu, Jul 24, 2008 at 02:51:26AM +0100, Olly Betts wrote:
> Rather than writing a full guide here, I'm going to write this up as a
> wiki page, since that will be easier for others to find in the future.
> I'll reply again when I'm done.
http://trac.xapian.org/wiki/FAQ/OmegaNewFileFormat
> > Is there any option/procedure to add a new mime plugin?
> > For example if you rename a docx .zip you can retrieve text from
> > document.xml
That's quite easy to do - you should be able to heavily base the code
on that which handles OpenDocument format. This extracts XML files
from inside a Zip format file with extension .odt or similar and then
does simple parsing to extract the document text.
Cheers,
Olly
More information about the Xapian-discuss
mailing list