[Xapian-devel] [Xapian-discuss] Dealing with image PDF's
Richard Boulton
richard at lemurconsulting.com
Thu Jul 31 09:55:15 BST 2008
Reini Urban wrote:
> 2008/7/30 Frank Bruzzaniti <frank.bruzzaniti at gmail.com>:
>> // Inspired by http://mjr.towers.org.uk/comp/sxw2text
>> string safefile = shell_protect(file);
>> string cmd = "tifftopnm " + safefile + " | gocr -f UTF8 -";
>> try {
>> dump = stdout_to_string(cmd);
>> } catch (ReadError) {
>> cout << "\"" << cmd << "\" failed - skipping\n";
>> return;
>> }
>
> Can we finally please use configure checks for such weird helper apps,
> to avoid runtime exceptions were the system clearly has no such app.
>
> I once provided a huge patch to to do that.
> http://thread.gmane.org/gmane.comp.search.xapian.devel/783/
Perhaps the patch should go in a ticket; that way, we're less likely to
forget about it.
> Applied to 1.0.5 it is attached. But there's much more in this patch
> so some parts may be stripped. See ChangeLog.
> TEXTCAT support for language and charset detection, cached virtual
> directories (zip,msg,pst,...) to name a few. Works fine for me for two
> years and I haven't touched
> it since 0.9.6.
Sounds useful. However, I'm not sure that configure time is the right
place to check for the existence of helper apps. In particular, quite
often omindex is installed from a pre-compiled package (for example, in
Debian), and the helper apps present at configure time need therefore
bear no relation to those present at runtime.
Perhaps omindex could be improved to handle missing helper applications
- I've not actually looked at how it handles this recently, so I don't
know if there's actually a problem, but if there is, the correct fix
seems to me to be to handle missing helper applications gracefully,
rather than disable them at configure time. Perhaps omindex would keep
a cache, during each run, of the helper applications which have been
found to be missing, so it would only attempt to run each one once.
--
Richard
More information about the Xapian-devel
mailing list