[Xapian-discuss] Omindex Filters
James Aylett
james-xapian at tartarus.org
Thu Sep 18 11:46:45 BST 2008
On Thu, Sep 18, 2008 at 04:11:15AM +0100, Olly Betts wrote:
> > How about XML for the output so we can incorporate any additional
> > meta-data.
>
> That's essentially why Recoll's filters convert to HTML. The main
> issue is that it adds the overhead of the external script converting
> to XML and then omindex parsing the XML to get back to the plain
> text.
I'm -1 on XML as an intermediate format, and -2 on HTML. I'm currently
tending towards the idea that we should initially just implement text,
since that will solve a lot of problems at a reasonable level (and
people can still use scriptindex), and then we can think about more
complex things later. (*Possibly* we could have the filter mechanism
use any of the internal parsers, meaning if you really wanted to
convert to HTML and parse that for extra metadata, you could.)
> It seems that you just have to use $TMPDIR or /tmp and hope that the
> system is sanely configured.
Trust the OS to manage $TMPDIR correctly. On any well-tuned decent OS
it'll be efficient. (On many it'll actually be a tmpfs anyway.)
J
--
/--------------------------------------------------------------------------\
James Aylett xapian.org
james at tartarus.org uncertaintydivision.org
More information about the Xapian-discuss
mailing list