[Xapian-discuss] index everything? (no extensions/no mime-types)
Olly Betts
olly at survex.com
Mon Mar 7 00:56:15 GMT 2011
On Wed, Mar 02, 2011 at 02:02:33PM -0600, Jeremy C. Reed wrote:
> On Sun, 20 Feb 2011, Olly Betts wrote:
>
> > There isn't a way to set a content-type regardless of extension
> > currently. Not sure that I can see a good use case for that.
>
> I have maybe over a hundred different unknown MIME types (troff, x-tex,
> pascal, fortran, x-c, x-c++, and much more) and I am sure it will
> change.
>
> If it is unknown I want it to fall back to just assume it is text or at
> least run strings on it.
>
> I need everything that might have text in it indexed (so I can skip
> images, videos, sound files).
It would be easy to allow a filter to be specified for just the first
part of the content-type as a fallback - e.g. text/html would be handled
as text/html, but text/x-c++ as text (unless you specified handling for
text/x-c++ too). I can see it being useful to be able to pass all
subtypes to a filter for other types too.
If you or someone else want to work on a patch, go for it. Otherwise
I'll try to sort it out, but it might take a while before I get to it.
Cheers,
Olly
More information about the Xapian-discuss
mailing list