[Xapian-discuss] Omindex Filters
Jean-Francois Dockes
jean-francois.dockes at wanadoo.fr
Wed Sep 17 09:11:28 BST 2008
Olly Betts writes:
> I think you need to at least consider the character set and format of
> the output (plain text and HTML are common), and possibly also filters
> which can only produce output to a file, not stdout. Meta-data is
> another issue (look at the PDF handling for example).
For what it's worth, the way Recoll handles this is to have all external
filters output HTML (using a wrapper script in most cases). Character set
and meta data information is issued as usual in the head section.
> It's true that some such issues can be handled to at least some extent
> with a wrapper script around the command, but then you're adding the
> overhead of forking several extra commands per file processed, which
> is better avoided.
One can't but agree with this. But the kind of document types which would
use an external filter are either unusual or heavy-weigth (the rest can
stay in-process). Executing a few additional commands for these may prove
not to be a major issue.
> We also don't want to encourage hacky handling of temporary files as
> that's a route straight to security bugs via symlink attacks - an
> obvious but bad approach to handling output to a file is a wrapper
> script like this one:
>
> #!/bin/sh
> foo2txt "$1" /tmp/$$.txt
> cat /tmp/$$.txt
> rm /tmp/$$.txt
You can do this without the shell too :)
jf
More information about the Xapian-discuss
mailing list