[Xapian-discuss] Index Apple iWork docs
linbloke
linbloke at fastmail.fm
Tue Sep 18 03:19:08 BST 2012
G'day,
I've searched for an indexer that can index Apple iWork docs but have had
no search success. I just wanted to share with you the following for
future reference.
2. Search and Index
This tip is for server-side, non-Apple based document indexing.
2.1. Keynote
For a given keynote file called testxyz.key:
cp textxyz.key textxyz.key.zip
mkdir textxyz.key.tmp
cd textxyz.key.tmp
unzip ../textxyz.key.zip
All text within the keynote file is stored in an xml file called index.apxl. The following adds newlines after xml tag closures and then filters xml tags, filters some > garbage, leaving only the text from the keynote file.
cat index.apxl | perl -pe 's/>/>\n/g' | perl -pe 's/<(.*?)>//g' | strings | grep -v '\>' > testxyz.key.txt
The text file can now be indexed.
See also: http://www.xml.com/pub/a/2004/01/07/keynote.html
Probably a better way to do it would be with an xml parser but that's beyond me. Please CC me with comments.
--
linbloke
linbloke at fastmail.fm
FOSS for all
--
http://www.fastmail.fm - mmm... Fastmail...
More information about the Xapian-discuss
mailing list