[Xapian-discuss] Omindex Filters
Jean-Francois Dockes
jean-francois.dockes at wanadoo.fr
Sat Oct 25 09:41:58 BST 2008
I follow up on my own (old) post not to let this discussion end on
something which is now wrong for current (1.11+) Recoll versions.
Abstract of the discussion:
jf wrote:
> Recoll uses a set of external filter scripts which wrap the native
> text extracters and always output html.
>
> Olly Betts writes:
> > Conclusion - rcldoc is 42% slower, and I've not factored in the extra
> > time omindex would need to spend parsing the HTML. Now I understand
> > that doc isn't a trivial to parse format, so I think this crude test
> > is indicative. Also, I did it on Linux which has a low process start
> > overhead. On cygwin this would be much worse.
Recoll 1.11 can now execute external filters which are either its own
scripts or bare arbitrary translaters, outputting either text or html. I
still think this was not a major issue, but the changes were simple
enough...
Only fools ... Thanks for the insight that made me change my mind.
jf
More information about the Xapian-discuss
mailing list