[Xapian-tickets] [Xapian] #114: Use libmagic or libextractor instead of own MIME mappings and extractions
Xapian
nobody at xapian.org
Tue Sep 30 08:19:03 BST 2008
#114: Use libmagic or libextractor instead of own MIME mappings and extractions
-------------------------+--------------------------------------------------
Reporter: nemesis | Owner: olly
Type: enhancement | Status: assigned
Priority: normal | Milestone: 1.1.0
Component: Omega | Version: SVN HEAD
Severity: minor | Resolution:
Keywords: | Blockedby:
Platform: All | Blocking:
-------------------------+--------------------------------------------------
Changes (by olly):
* milestone: => 1.1.0
Old description:
> Hello,
>
> I locally first modified omindex to use libmagic's MIME database, instead
> of
> hard coding the MIME type to file extension mapping. This ensures that
> the
> internally used MIME types are more consistent with accepted standard
> types.
>
> Then I went further and instead of using file extensions to determine
> type,
> used libmagic to fingerprint the files. This is slower, but ensures that
> the
> file actually is identified correctly even if the extension is wrong.
>
> Now I am using libextractor to actually extract the metadata from the
> file,
> instead of calling these external programs inside omindex based on the
> MIME
> type. Using libextractor greatly simplifies omindex.
>
> Is anyone interested in these modifications?
New description:
Hello,
I locally first modified omindex to use libmagic's MIME database, instead
of
hard coding the MIME type to file extension mapping. This ensures that
the
internally used MIME types are more consistent with accepted standard
types.
Then I went further and instead of using file extensions to determine
type,
used libmagic to fingerprint the files. This is slower, but ensures that
the
file actually is identified correctly even if the extension is wrong.
Now I am using libextractor to actually extract the metadata from the
file,
instead of calling these external programs inside omindex based on the
MIME
type. Using libextractor greatly simplifies omindex.
Is anyone interested in these modifications?
--
Comment:
Did you ever get a chance to code this up?
I'm looking at what we want to try to get into Xapian 1.1.0, and this is a
candidate, especially if there's already a working patch!
--
Ticket URL: <http://trac.xapian.org/ticket/114#comment:12>
Xapian <http://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list