[Xapian-tickets] [Xapian] #290: Omega support for Office 2007 Word and Excel Documents

Xapian nobody at xapian.org
Wed Feb 18 14:36:20 GMT 2009


#290: Omega support for Office 2007 Word and Excel Documents
-------------------------+--------------------------------------------------
 Reporter:  frankjb      |        Owner:  olly     
     Type:  enhancement  |       Status:  assigned 
 Priority:  normal       |    Milestone:  1.1.1    
Component:  Omega        |      Version:  SVN trunk
 Severity:  normal       |   Resolution:           
 Keywords:               |    Blockedby:           
 Platform:  All          |     Blocking:           
-------------------------+--------------------------------------------------

Comment(by frankjb):

 Replying to [comment:2 olly]:
 > I've taken a closer look at the patch.  It looks good apart from the
 lack of documentation updates and example files.  I'm afraid I don't
 currently have the time to update the documentation or track down suitable
 examples myself right now, so I'm moving the milestone to 1.1.0.
 >
 > While checking the content-types used were appropriate (oddly they
 aren't listed by IANA,
 > but they are mentioned in posts of blogs.msdn.com so I guess they're OK)
 I found there
 > are some other formats from which we can probably extract text in the
 same way.
 >
 > If you can comment on any of the following, that would help.  Otherwise
 I'll research as
 > time allows.
 >
 > http://blogs.msdn.com/dmahugh/archive/2006/08/08/692600.aspx lists more
 extensions and content types:
 >
 > * Are the weirdly-named "macroEnabled.12" variants compatible formats?
 >
 > * We handle .dot files, so should handle .dotx too if possible.
 >
 > * We should handle .ppsx and .pptx if the same approach works (and .ppsm
 and .pptm if they have the same format).
 >
 > * Ditto .xps.
 >
 > http://blogs.msdn.com/ericwhite/pages/the-openxmldocument-class.aspx
 also mentions "drawings".

 XPS format is do-able, it's very similar.

 Office 2007 mimetypes are here:
 http://blogs.msdn.com/vsofficedeveloper/pages/Office-2007-Open-XML-MIME-
 Types.aspx

 Can't find any mention of a "macroEnabled.12" variant for any of the
 "openxmlformats" like docx:

 I'll see if I can filter this lot:
 .docx
 .dotx
 .xlsx
 .xltx
 .pptx
 .potx
 .ppsx

 BTW for test documents if I use some text in jap and english would that
 work?

-- 
Ticket URL: <http://trac.xapian.org/ticket/290#comment:5>
Xapian <http://xapian.org/>
Xapian



More information about the Xapian-tickets mailing list