Plain text files without extension
Wilbert van Bakel
wilbert.vanbakel at gmail.com
Fri Dec 20 01:52:41 GMT 2024
Many thanks for your response.
libmagic is enabled. $file reports:
- ASCII text
- data
- news or mail, ISO-8859 text
- news or mail, ASCII text
- news or mail, Unicode text, UTF-8 text
When I install HTML/Entities.pm and MIME/parser.pm I get errors in every
file. ("Garbage at end of string in strptime", "Perhaps a format flag did
not match the actual input?").
I get the best result with: -mime-type=:text/plain, many thanks for this
suggestion.
Wilbert
On Thu, Dec 19, 2024 at 6:37 PM Olly Betts <olly at survex.com> wrote:
> On Thu, Dec 19, 2024 at 03:17:13PM -0600, Wilbert van Bakel wrote:
> > I have many plain text files that don't have an extension.
> > I notice that omindex is skipping them.
> > Is there a way to include these files?
>
> Are you using a build of omega with libmagic support enabled
> (it's optional in 1.4.x, but will be a hard requirement in the next
> release series)? If not, I'd try using a build with that as then
> omindex should ask libmagic to inspect files it has no mapping for and
> index based on the reported type.
>
> If you are then possibly libmagic fails to recognise these files are
> plain text, or detects a different type for them - I think
> the omindex output reports the detected MIME content type, but you can
> also approximate this check with the command line `file` tool (which
> also uses libmagic):
>
> $ file --mime-type README
> README: text/plain
>
> If libmagic's answer is the problem then you can add a mapping for the
> empty extension to override libmagic for such files, e.g.:
>
> omindex --mime-type=:text/plain
>
> The downside of this approach is that this will get applied to any file
> without an extension, and you may have some such files which aren't
> plain text.
>
> If the files you want to match have a naming pattern (say their names
> all start `README`) then you can match them based on a glob pattern,
> e.g.:
>
> omindex --mime-type-match='README*':text/plain
>
> You can specify multiple patterns if needed.
>
> If libmagic is consistently detecting a different type for these files
> (and not detecting that type for non-plain-text files) it'd be handy
> to be able to tell omindex to "treat text/something like text/plain".
> There's not an option to explicitly do this currently, but you should
> be able to achieve it by creative use of `--filter`:
>
> omindex --filter=text/something:cat
>
> The downside is that this will pass the contents of such files through
> `cat` in order to read them.
>
> Hope that helps.
>
> Cheers,
> Olly
>
More information about the Xapian-discuss
mailing list