[Xapian-discuss] index everything? (no extensions/no mime-types)

Olly Betts olly at survex.com
Mon Mar 7 00:56:15 GMT 2011


On Wed, Mar 02, 2011 at 02:02:33PM -0600, Jeremy C. Reed wrote:
> On Sun, 20 Feb 2011, Olly Betts wrote:
> 
> > There isn't a way to set a content-type regardless of extension
> > currently.  Not sure that I can see a good use case for that.
> 
> I have maybe over a hundred different unknown MIME types (troff, x-tex,   
> pascal, fortran, x-c, x-c++, and much more) and I am sure it will
> change.
> 
> If it is unknown I want it to fall back to just assume it is text or at
> least run strings on it.
> 
> I need everything that might have text in it indexed (so I can skip
> images, videos, sound files).

It would be easy to allow a filter to be specified for just the first
part of the content-type as a fallback - e.g. text/html would be handled
as text/html, but text/x-c++ as text (unless you specified handling for
text/x-c++ too).  I can see it being useful to be able to pass all
subtypes to a filter for other types too.

If you or someone else want to work on a patch, go for it.  Otherwise
I'll try to sort it out, but it might take a while before I get to it.

Cheers,
    Olly



More information about the Xapian-discuss mailing list