[Xapian-tickets] [Xapian] #290: Omega support for Office 2007 Word and Excel Documents
Xapian
nobody at xapian.org
Wed Feb 18 14:36:20 GMT 2009
#290: Omega support for Office 2007 Word and Excel Documents
-------------------------+--------------------------------------------------
Reporter: frankjb | Owner: olly
Type: enhancement | Status: assigned
Priority: normal | Milestone: 1.1.1
Component: Omega | Version: SVN trunk
Severity: normal | Resolution:
Keywords: | Blockedby:
Platform: All | Blocking:
-------------------------+--------------------------------------------------
Comment(by frankjb):
Replying to [comment:2 olly]:
> I've taken a closer look at the patch. It looks good apart from the
lack of documentation updates and example files. I'm afraid I don't
currently have the time to update the documentation or track down suitable
examples myself right now, so I'm moving the milestone to 1.1.0.
>
> While checking the content-types used were appropriate (oddly they
aren't listed by IANA,
> but they are mentioned in posts of blogs.msdn.com so I guess they're OK)
I found there
> are some other formats from which we can probably extract text in the
same way.
>
> If you can comment on any of the following, that would help. Otherwise
I'll research as
> time allows.
>
> http://blogs.msdn.com/dmahugh/archive/2006/08/08/692600.aspx lists more
extensions and content types:
>
> * Are the weirdly-named "macroEnabled.12" variants compatible formats?
>
> * We handle .dot files, so should handle .dotx too if possible.
>
> * We should handle .ppsx and .pptx if the same approach works (and .ppsm
and .pptm if they have the same format).
>
> * Ditto .xps.
>
> http://blogs.msdn.com/ericwhite/pages/the-openxmldocument-class.aspx
also mentions "drawings".
XPS format is do-able, it's very similar.
Office 2007 mimetypes are here:
http://blogs.msdn.com/vsofficedeveloper/pages/Office-2007-Open-XML-MIME-
Types.aspx
Can't find any mention of a "macroEnabled.12" variant for any of the
"openxmlformats" like docx:
I'll see if I can filter this lot:
.docx
.dotx
.xlsx
.xltx
.pptx
.potx
.ppsx
BTW for test documents if I use some text in jap and english would that
work?
--
Ticket URL: <http://trac.xapian.org/ticket/290#comment:5>
Xapian <http://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list