[Xapian-tickets] [Xapian] #569: Generate omindex docs and code relating to file types

Xapian nobody at xapian.org
Tue Apr 10 21:01:17 BST 2018


#569: Generate omindex docs and code relating to file types
--------------------+-----------------------------
 Reporter:  catkin  |             Owner:  olly
     Type:  defect  |            Status:  assigned
 Priority:  normal  |         Milestone:  1.4.x
Component:  Omega   |           Version:  1.2.5
 Severity:  normal  |        Resolution:
 Keywords:          |        Blocked By:
 Blocking:          |  Operating System:  All
--------------------+-----------------------------
Description changed by olly:

Old description:

> We should try to generate all the docs and code relating to file types
> from a common source to ensure they stay in step with one another.
>
> ----
> ''Original description:''
>
> From the omindex man page:
>
> {{{
> -F, --filter=TYPE:CMD
>     process files with MIME Content-Type TYPE using command CMD, which
> should produce UTF-8 text on stdout e.g. -Fapplica‐tion/octet-
> stream:'strings -n8
> }}}
>
> This could be understood to mean that omindex examines files to determine
> their MIME type (I understood it that way) but from Olly's posting,
> subject "Re: [Xapian-discuss] Tika 0.8 failure rates", date 5oct11:
>
> By default, omindex currently uses a list of extension->MIME
> content-type mappings, and only consults the magic library for
> extensions it doesn't know.  So any file with a .doc extension will be
> considered as application/msword (unless you run omindex with
> '--mime-type=doc:').
>
> A note about this could be added to the omindex man page and referenced
> from the -F and -M options descriptions.
>
> See http://www.fixithere.net/hmrc-contact-number/

New description:

 We should try to generate all the docs and code relating to file types
 from a common source to ensure they stay in step with one another.

 ----
 ''Original description:''

 From the omindex man page:

 {{{
 -F, --filter=TYPE:CMD
     process files with MIME Content-Type TYPE using command CMD, which
 should produce UTF-8 text on stdout e.g. -Fapplica‐tion/octet-
 stream:'strings -n8
 }}}

 This could be understood to mean that omindex examines files to determine
 their MIME type (I understood it that way) but from Olly's posting,
 subject "Re: [Xapian-discuss] Tika 0.8 failure rates", date 5oct11:

 By default, omindex currently uses a list of extension->MIME
 content-type mappings, and only consults the magic library for
 extensions it doesn't know.  So any file with a .doc extension will be
 considered as application/msword (unless you run omindex with
 '--mime-type=doc:').

 A note about this could be added to the omindex man page and referenced
 from the -F and -M options descriptions.

--

--
Ticket URL: <https://trac.xapian.org/ticket/569#comment:15>
Xapian <https://xapian.org/>
Xapian



More information about the Xapian-tickets mailing list