[Xapian-tickets] [Xapian] #290: Omega support for Office 2007 Word and Excel Documents
Xapian
nobody at xapian.org
Tue Sep 2 09:51:22 BST 2008
#290: Omega support for Office 2007 Word and Excel Documents
-------------------------+--------------------------------------------------
Reporter: frankjb | Owner: olly
Type: enhancement | Status: assigned
Priority: normal | Milestone: 1.1.0
Component: Omega | Version: SVN HEAD
Severity: normal | Resolution:
Keywords: | Blockedby:
Platform: All | Blocking:
-------------------------+--------------------------------------------------
Changes (by olly):
* milestone: 1.0.8 => 1.1.0
Comment:
I've taken a closer look at the patch. It looks good apart from the lack
of documentation updates and example files. I'm afraid I don't currently
have the time to update the documentation or track down suitable examples
myself right now, so I'm moving the milestone to 1.1.0.
While checking the content-types used were appropriate (oddly they aren't
listed by IANA,
but they are mentioned in posts of blogs.msdn.com so I guess they're OK) I
found there
are some other formats from which we can probably extract text in the same
way.
If you can comment on any of the following, that would help. Otherwise
I'll research as
time allows.
http://blogs.msdn.com/dmahugh/archive/2006/08/08/692600.aspx lists more
extensions and content types:
* Are the weirdly-named "macroEnabled.12" variants compatible formats?
* We handle .dot files, so should handle .dotx too if possible.
* We should handle .ppsx and .pptx if the same approach works (and .ppsm
and .pptm if they have the same format).
* Ditto .xps.
http://blogs.msdn.com/ericwhite/pages/the-openxmldocument-class.aspx also
mentions "drawings".
--
Ticket URL: <http://trac.xapian.org/ticket/290#comment:2>
Xapian <http://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list