[Xapian-tickets] [Xapian] #290: Omega support for Office 2007 Word and Excel Documents

Xapian nobody at xapian.org
Tue Sep 2 09:51:22 BST 2008


#290: Omega support for Office 2007 Word and Excel Documents
-------------------------+--------------------------------------------------
 Reporter:  frankjb      |        Owner:  olly    
     Type:  enhancement  |       Status:  assigned
 Priority:  normal       |    Milestone:  1.1.0   
Component:  Omega        |      Version:  SVN HEAD
 Severity:  normal       |   Resolution:          
 Keywords:               |    Blockedby:          
 Platform:  All          |     Blocking:          
-------------------------+--------------------------------------------------
Changes (by olly):

  * milestone:  1.0.8 => 1.1.0


Comment:

 I've taken a closer look at the patch.  It looks good apart from the lack
 of documentation updates and example files.  I'm afraid I don't currently
 have the time to update the documentation or track down suitable examples
 myself right now, so I'm moving the milestone to 1.1.0.

 While checking the content-types used were appropriate (oddly they aren't
 listed by IANA,
 but they are mentioned in posts of blogs.msdn.com so I guess they're OK) I
 found there
 are some other formats from which we can probably extract text in the same
 way.

 If you can comment on any of the following, that would help.  Otherwise
 I'll research as
 time allows.

 http://blogs.msdn.com/dmahugh/archive/2006/08/08/692600.aspx lists more
 extensions and content types:

 * Are the weirdly-named "macroEnabled.12" variants compatible formats?

 * We handle .dot files, so should handle .dotx too if possible.

 * We should handle .ppsx and .pptx if the same approach works (and .ppsm
 and .pptm if they have the same format).

 * Ditto .xps.

 http://blogs.msdn.com/ericwhite/pages/the-openxmldocument-class.aspx also
 mentions "drawings".

-- 
Ticket URL: <http://trac.xapian.org/ticket/290#comment:2>
Xapian <http://xapian.org/>
Xapian



More information about the Xapian-tickets mailing list