[Xapian-discuss] Omindex Filters

Jean-Francois Dockes jean-francois.dockes at wanadoo.fr
Sat Oct 25 09:41:58 BST 2008


I follow up on my own (old) post not to let this discussion end on
something which is now wrong for current (1.11+) Recoll versions.

Abstract of the discussion:
jf wrote:
 > Recoll uses a set of external filter scripts which wrap the native
 > text extracters and always output html.
 > 
 > Olly Betts writes:
 >  > Conclusion - rcldoc is 42% slower, and I've not factored in the extra
 >  > time omindex would need to spend parsing the HTML.  Now I understand
 >  > that doc isn't a trivial to parse format, so I think this crude test
 >  > is indicative.  Also, I did it on Linux which has a low process start
 >  > overhead.  On cygwin this would be much worse.

Recoll 1.11 can now execute external filters which are either its own
scripts or bare arbitrary translaters, outputting either text or html. I
still think this was not a major issue, but the changes were simple
enough...

Only fools ... Thanks for the insight that made me change my mind.

jf



More information about the Xapian-discuss mailing list