[Xapian-discuss] PowerPoint 2007 filter

Frank John Bruzzaniti frank.bruzzaniti at gmail.com
Tue Feb 3 13:43:27 GMT 2009


Hi,

I'm trying to write the PowerPoint2007 filter in the same manner that I
did for *.docx and *.xlsx but I'm getting the following error when I tru
an index.

The document is called:  

Indexing "/Frisk in Power Point.pptx" as
application/vnd.openxmlformats-officedocument.presentationml.presentation ... caution: filename not matched:  ppt/notesSlides/notesSlide*.xml
caution: filename not matched:  ppt/comments/comment*.xml

The problem is that not all pptx files contain notes and comments.

Do you think just including the slide text is enough, if not how can I
test to see if the files exists, it looks like unzip throws an error id
the file dosen;t exsist can I test this with a couple of if's (my c
isn;t very good was hoping someone could help me with the coding).

Here's what I have so far from omindex.cc it works for the main slides
you will see the other command commented out that also extracts notes
and comments from the powerpoint file.

// Start: PowerPoint 2007 .pptx
    } else if (startswith(mimetype,
"application/vnd.openxmlformats-officedocument.presentationml."))
    {
    // Inspired by http://mjr.towers.org.uk/comp/sxw2text
    string safefile = shell_protect(file);
    /* string cmd = "unzip -p " + safefile + " ppt/slides/slide*.xml
ppt/notesSlides/notesSlide*.xml ppt/comments/comment*.xml"; */
       string cmd = "unzip -p " + safefile + " ppt/slides/slide*.xml";
    try {
        XmlParser xmlparser;
        xmlparser.parse_html(stdout_to_string(cmd));
        dump = xmlparser.dump;
    } catch (ReadError) {
        cout << "\"" << cmd << "\" failed - skipping\n";
        return;
    }
    // End: PowerPoint 2007 .pptx



FYI the mime type I entered was:

mime_map["pptx"] =
"application/vnd.openxmlformats-officedocument.presentationml.presentation";



More information about the Xapian-discuss mailing list