[Xapian-tickets] [Xapian] #114: Use libmagic or libextractor instead of own MIME mappings and extractions

Xapian nobody at xapian.org
Tue Sep 30 08:19:03 BST 2008


#114: Use libmagic or libextractor instead of own MIME mappings and extractions
-------------------------+--------------------------------------------------
 Reporter:  nemesis      |        Owner:  olly    
     Type:  enhancement  |       Status:  assigned
 Priority:  normal       |    Milestone:  1.1.0   
Component:  Omega        |      Version:  SVN HEAD
 Severity:  minor        |   Resolution:          
 Keywords:               |    Blockedby:          
 Platform:  All          |     Blocking:          
-------------------------+--------------------------------------------------
Changes (by olly):

  * milestone:  => 1.1.0


Old description:

> Hello,
>
> I locally first modified omindex to use libmagic's MIME database, instead
> of
> hard coding the MIME type to file extension mapping.  This ensures that
> the
> internally used MIME types are more consistent with accepted standard
> types.
>
> Then I went further and instead of using file extensions to determine
> type,
> used libmagic to fingerprint the files.  This is slower, but ensures that
> the
> file actually is identified correctly even if the extension is wrong.
>
> Now I am using libextractor to actually extract the metadata from the
> file,
> instead of calling these external programs inside omindex based on the
> MIME
> type.  Using libextractor greatly simplifies omindex.
>
> Is anyone interested in these modifications?

New description:

 Hello,

 I locally first modified omindex to use libmagic's MIME database, instead
 of
 hard coding the MIME type to file extension mapping.  This ensures that
 the
 internally used MIME types are more consistent with accepted standard
 types.

 Then I went further and instead of using file extensions to determine
 type,
 used libmagic to fingerprint the files.  This is slower, but ensures that
 the
 file actually is identified correctly even if the extension is wrong.

 Now I am using libextractor to actually extract the metadata from the
 file,
 instead of calling these external programs inside omindex based on the
 MIME
 type.  Using libextractor greatly simplifies omindex.

 Is anyone interested in these modifications?

--

Comment:

 Did you ever get a chance to code this up?

 I'm looking at what we want to try to get into Xapian 1.1.0, and this is a
 candidate, especially if there's already a working patch!

-- 
Ticket URL: <http://trac.xapian.org/ticket/114#comment:12>
Xapian <http://xapian.org/>
Xapian



More information about the Xapian-tickets mailing list