[Xapian-discuss] Parsing .msg files

James Aylett james-xapian at tartarus.org
Fri Sep 12 17:26:30 BST 2008


On Sat, Sep 13, 2008 at 01:53:26AM +0930, Frank J Bruzzaniti wrote:

> I'm trying to parse .msg files.
> 
> I found a patch on trac but it looks like it uses a program called 
> outlook2txt which I can;t find anywhere.
> 
> The other thought was to pipe the file through the utility strings and 
> then use the html parser.  I do still get a little bit of junk left over 
> tho.
> 
> Anyone else know of a better way?

If you have access to a Windows machine with Outlook, you can use
python + COM to programmatically access the Outlook object model. It's
a bit fiddly, and there are bits that aren't exposed (although there's
another plugin that is supposed to fix that, I never got it to
work). It was sufficient for me to export several years of emails to
mbox format a while back.

J

-- 
/--------------------------------------------------------------------------\
  James Aylett                                                  xapian.org
  james at tartarus.org                               uncertaintydivision.org



More information about the Xapian-discuss mailing list