[Xapian-devel] [Xapian-discuss] Dealing with image PDF's

Olly Betts olly at survex.com
Sat Aug 2 00:59:29 BST 2008


On Fri, Aug 01, 2008 at 11:15:20AM +0100, James Aylett wrote:
> On Thu, Jul 31, 2008 at 12:54:07PM +0100, Richard Boulton wrote:
> > I'm definitely opposed to hardcoding the location of files, incidentally 
> > - there are all sorts of reasons that a user might want to use an 
> > alternative helper file, and allowing them to simply place such a file 
> > somewhere early on PATH is a good way to do this.
> 
> We just want the expected execvp() behaviour, don't we?

Yes, I think so (execvp() is documented as doing it like the shell
does).

> We could also use something similar to mailcap + mime.types, on
> systems that support them.

The standard mailcap file entries are slanted too much towards human
viewability rather than provided text in a suitable form for indexing
without caring much about formatting.  And for images and video we
want the meta-data rather than the content.  But the format might be
a sane choice.

Recoll uses filter system which seems to be taken from Estraier.  It
uses a shell script which does the work for each format, but it has to
output HTML which often seems to require a run through sed to escape
'<', '>', and '&', and then the indexer has to parse the HTML, which all
seems a bit unnecessary.  But it might be nice to support such filter
scripts as an option.

Cheers,
    Olly



More information about the Xapian-devel mailing list