[Xapian-discuss] PowerPoint 2007 filter
Frank John Bruzzaniti
frank.bruzzaniti at gmail.com
Tue Feb 3 13:43:27 GMT 2009
Hi,
I'm trying to write the PowerPoint2007 filter in the same manner that I
did for *.docx and *.xlsx but I'm getting the following error when I tru
an index.
The document is called:
Indexing "/Frisk in Power Point.pptx" as
application/vnd.openxmlformats-officedocument.presentationml.presentation ... caution: filename not matched: ppt/notesSlides/notesSlide*.xml
caution: filename not matched: ppt/comments/comment*.xml
The problem is that not all pptx files contain notes and comments.
Do you think just including the slide text is enough, if not how can I
test to see if the files exists, it looks like unzip throws an error id
the file dosen;t exsist can I test this with a couple of if's (my c
isn;t very good was hoping someone could help me with the coding).
Here's what I have so far from omindex.cc it works for the main slides
you will see the other command commented out that also extracts notes
and comments from the powerpoint file.
// Start: PowerPoint 2007 .pptx
} else if (startswith(mimetype,
"application/vnd.openxmlformats-officedocument.presentationml."))
{
// Inspired by http://mjr.towers.org.uk/comp/sxw2text
string safefile = shell_protect(file);
/* string cmd = "unzip -p " + safefile + " ppt/slides/slide*.xml
ppt/notesSlides/notesSlide*.xml ppt/comments/comment*.xml"; */
string cmd = "unzip -p " + safefile + " ppt/slides/slide*.xml";
try {
XmlParser xmlparser;
xmlparser.parse_html(stdout_to_string(cmd));
dump = xmlparser.dump;
} catch (ReadError) {
cout << "\"" << cmd << "\" failed - skipping\n";
return;
}
// End: PowerPoint 2007 .pptx
FYI the mime type I entered was:
mime_map["pptx"] =
"application/vnd.openxmlformats-officedocument.presentationml.presentation";
More information about the Xapian-discuss
mailing list