[Xapian-devel] [Xapian-discuss] Dealing with image PDF's
Olly Betts
olly at survex.com
Sat Aug 2 00:59:29 BST 2008
On Fri, Aug 01, 2008 at 11:15:20AM +0100, James Aylett wrote:
> On Thu, Jul 31, 2008 at 12:54:07PM +0100, Richard Boulton wrote:
> > I'm definitely opposed to hardcoding the location of files, incidentally
> > - there are all sorts of reasons that a user might want to use an
> > alternative helper file, and allowing them to simply place such a file
> > somewhere early on PATH is a good way to do this.
>
> We just want the expected execvp() behaviour, don't we?
Yes, I think so (execvp() is documented as doing it like the shell
does).
> We could also use something similar to mailcap + mime.types, on
> systems that support them.
The standard mailcap file entries are slanted too much towards human
viewability rather than provided text in a suitable form for indexing
without caring much about formatting. And for images and video we
want the meta-data rather than the content. But the format might be
a sane choice.
Recoll uses filter system which seems to be taken from Estraier. It
uses a shell script which does the work for each format, but it has to
output HTML which often seems to require a run through sed to escape
'<', '>', and '&', and then the indexer has to parse the HTML, which all
seems a bit unnecessary. But it might be nice to support such filter
scripts as an option.
Cheers,
Olly
More information about the Xapian-devel
mailing list