[Xapian-discuss] docx support

Frank Bruzzaniti frank.bruzzaniti at gmail.com
Thu Jul 24 12:32:48 BST 2008


I was going to try flax if I couldn;t get this working on a linux box.

One question I have re omindex, when I run a crawl I see:

Indexing "/New Spreadsheet.ots" as 
application/vnd.oasis.opendocument.spreadsheet-template ... updated.

I assume omindex uses OpenOffice to do the conversion.

I can open *.docx with OpenOffice and save as a *.txt how come you don;t 
use open office for the bulk of your conversions?

Charlie Hull wrote:
> Olly Betts wrote:
>   
>> On Thu, Jul 24, 2008 at 04:08:26AM +0930, Frank Bruzzaniti wrote:
>>     
>>> Is office 2007 formats like docx supported?
>>>       
>> Out of the box, not unless antiword supports it.  The last update to the
>> debian packaged version was August 2006, so I suspect the answer is
>> "no".
>>
>>     
> Just to say that we've looked at this for Flax and we're using the 
> IFilter system, which since it is provided for Microsoft is pretty good 
> with Microsoft formats. Of course, this only works on a Windows box, and 
> needs COM, and it's not open source, so you'd probably need to parse 
> into an intermediate format. There's a list of available IFilters on 
> www.ifilter.org
>
> Cheers
>
> Charlie
>
> _______________________________________________
> Xapian-discuss mailing list
> Xapian-discuss at lists.xapian.org
> http://lists.xapian.org/mailman/listinfo/xapian-discuss
>   


More information about the Xapian-discuss mailing list