[Xapian-discuss] Indexing PDF files with scriptindex
James Aylett
james-xapian at tartarus.org
Mon Sep 24 11:42:39 BST 2007
On Sun, Sep 23, 2007 at 11:15:52PM +0100, Olly Betts wrote:
> At the moment "load" just loads the literal contents of the file.
> So you can use it for plain text, and also for HTML (thanks to the
> "unhtml" action). You could use it for some XML files (if simply
> stripping the tags out is the right approach, then "unhtml" will do
> that for XML too). But it doesn't support running filter programs.
It wouldn't be terribly difficult to write a new action
filter=FILTERCMD to do this. It's slightly more fiddly to get it to be
efficient, but it could be done.
J
--
/--------------------------------------------------------------------------\
James Aylett xapian.org
james at tartarus.org uncertaintydivision.org
More information about the Xapian-discuss
mailing list