[Xapian-discuss] Index Apple iWork docs

linbloke linbloke at fastmail.fm
Tue Sep 18 03:19:08 BST 2012


G'day,

I've searched for an indexer that can index Apple iWork docs but have had
no search success. I just wanted to share with you the following for
future reference. 

2. Search and Index

This tip is for server-side, non-Apple based document indexing.

2.1. Keynote

For a given keynote file called testxyz.key:

cp textxyz.key textxyz.key.zip
mkdir textxyz.key.tmp
cd textxyz.key.tmp
unzip ../textxyz.key.zip

All text within the keynote file is stored in an xml file called index.apxl. The following adds newlines after xml tag closures and then filters xml tags, filters some &gt garbage, leaving only the text from the keynote file.

cat index.apxl | perl -pe 's/>/>\n/g' | perl -pe 's/<(.*?)>//g' | strings | grep -v '\&gt' > testxyz.key.txt

The text file can now be indexed.
See also: http://www.xml.com/pub/a/2004/01/07/keynote.html


Probably a better way to do it would be with an xml parser but that's beyond me. Please CC me with comments.

-- 
  linbloke
  linbloke at fastmail.fm
  FOSS for all

-- 
http://www.fastmail.fm - mmm... Fastmail...




More information about the Xapian-discuss mailing list