Plain text files without extension

Wilbert van Bakel wilbert.vanbakel at gmail.com
Fri Dec 20 01:52:41 GMT 2024


Many thanks for your response.

libmagic is enabled. $file reports:

   - ASCII text
   - data
   - news or mail, ISO-8859 text
   - news or mail, ASCII text
   - news or mail, Unicode text, UTF-8 text


When I install HTML/Entities.pm and MIME/parser.pm I get errors in every
file. ("Garbage at end of string in strptime", "Perhaps a format flag did
not match the actual input?").

I get the best result with: -mime-type=:text/plain, many thanks for this
suggestion.
Wilbert


On Thu, Dec 19, 2024 at 6:37 PM Olly Betts <olly at survex.com> wrote:

> On Thu, Dec 19, 2024 at 03:17:13PM -0600, Wilbert van Bakel wrote:
> > I have many plain text files that don't have an extension.
> > I notice that omindex is skipping them.
> > Is there a way to include these files?
>
> Are you using a build of omega with libmagic support enabled
> (it's optional in 1.4.x, but will be a hard requirement in the next
> release series)?  If not, I'd try using a build with that as then
> omindex should ask libmagic to inspect files it has no mapping for and
> index based on the reported type.
>
> If you are then possibly libmagic fails to recognise these files are
> plain text, or detects a different type for them - I think
> the omindex output reports the detected MIME content type, but you can
> also approximate this check with the command line `file` tool (which
> also uses libmagic):
>
> $ file --mime-type README
> README: text/plain
>
> If libmagic's answer is the problem then you can add a mapping for the
> empty extension to override libmagic for such files, e.g.:
>
> omindex --mime-type=:text/plain
>
> The downside of this approach is that this will get applied to any file
> without an extension, and you may have some such files which aren't
> plain text.
>
> If the files you want to match have a naming pattern (say their names
> all start `README`) then you can match them based on a glob pattern,
> e.g.:
>
> omindex --mime-type-match='README*':text/plain
>
> You can specify multiple patterns if needed.
>
> If libmagic is consistently detecting a different type for these files
> (and not detecting that type for non-plain-text files) it'd be handy
> to be able to tell omindex to "treat text/something like text/plain".
> There's not an option to explicitly do this currently, but you should
> be able to achieve it by creative use of `--filter`:
>
> omindex --filter=text/something:cat
>
> The downside is that this will pass the contents of such files through
> `cat` in order to read them.
>
> Hope that helps.
>
> Cheers,
>     Olly
>


More information about the Xapian-discuss mailing list