[Xapian-discuss] Indexing PDF files with scriptindex

James Aylett james-xapian at tartarus.org
Mon Sep 24 11:42:39 BST 2007


On Sun, Sep 23, 2007 at 11:15:52PM +0100, Olly Betts wrote:

> At the moment "load" just loads the literal contents of the file.
> So you can use it for plain text, and also for HTML (thanks to the
> "unhtml" action).  You could use it for some XML files (if simply
> stripping the tags out is the right approach, then "unhtml" will do
> that for XML too).  But it doesn't support running filter programs.

It wouldn't be terribly difficult to write a new action
filter=FILTERCMD to do this. It's slightly more fiddly to get it to be
efficient, but it could be done.

J

-- 
/--------------------------------------------------------------------------\
  James Aylett                                                  xapian.org
  james at tartarus.org                               uncertaintydivision.org



More information about the Xapian-discuss mailing list