[Xapian-discuss] Dealing with image PDF's

Reini Urban rurban at x-ray.at
Thu Jul 31 08:53:24 BST 2008


2008/7/30 Frank Bruzzaniti <frank.bruzzaniti at gmail.com>:
>    // Inspired by http://mjr.towers.org.uk/comp/sxw2text
>    string safefile = shell_protect(file);
>    string cmd = "tifftopnm " + safefile + " | gocr -f UTF8 -";
>    try {
>        dump = stdout_to_string(cmd);
>    } catch (ReadError) {
>        cout << "\"" << cmd << "\" failed - skipping\n";
>        return;
>    }

Can we finally please use configure checks for such weird helper apps,
to avoid runtime exceptions were the system clearly has no such app.

I once provided a huge patch to to do that.
http://thread.gmane.org/gmane.comp.search.xapian.devel/783/

Applied to 1.0.5 it is attached. But there's much more in this patch
so some parts may be stripped. See ChangeLog.
TEXTCAT support for language and charset detection, cached virtual
directories (zip,msg,pst,...) to name a few. Works fine for me for two
years and I haven't touched
it since 0.9.6.
-- 
Reini Urban
http://phpwiki.org/ http://murbreak.at/


More information about the Xapian-discuss mailing list