[GSoC] Questions about project Text-Extraction Libraries

Olly Betts olly at survex.com
Wed Mar 27 23:40:33 GMT 2019


On Wed, Mar 27, 2019 at 12:52:15PM -0300, Bruno Baruffaldi wrote:
> One last query, I was thinking if it would be worth trying to use an
> external filter (when it is available) in case a particular library fails
> on run time.
> 
> Have you considered it?

In most cases the external filter (e.g. key2text) is just a thin wrapper
around the library we'd use (e.g. libetonyek), and if extracting with the
library fails there's little point retrying, as it will be the same file
and the same code parsing it, just called via a different route.

There are a few formats where there's a choice of filter, but even there
being able to specify a list to try seems of limited benefit - the files
which fail should be rare exceptions (or else the filter just isn't up
to the job) and it's better for the user to know about them so they can
file a bug against the filter.

Cheers,
    Olly



More information about the Xapian-devel mailing list