[Xapian-discuss] Indexing images, .zip, .dwg and other binary files

Brian Burton dynamoeffects at gmail.com
Tue Nov 30 23:05:22 GMT 2010


I have finally cobbled together a solution so I'll post it here for anyone
else who has this question.

1) Open the xapian-omega source directory and edit the omindex.cc file.

2) Starting at line 539 (in version 1.0.21) change these lines:
    } else {
      // Don't know how to index this type.
      cout << "unknown MIME type - skipping" << endl;
      return;
    }

to this:
    } else {
      dump = file;
      title = file;
      keywords = file;
      sample = file;
    }

This creates a sort of "catch all" to index files even if it doesn't know
what they are.

3) Around line 845 where the mime_map array is set up, add your extensions
and their mimetypes like so:

    mime_map["jpg"] = "image/jpeg";
    mime_map["jpeg"] = "image/jpeg";
    mime_map["gif"] = "image/gif";
    mime_map["png"] = "image/png";
    mime_map["bmp"] = "image/bmp";
    mime_map["psd"] = "image/photoshop";
    mime_map["dwg"] = "application/acad";
    mime_map["mp3"] = "audio/mpeg";
    mime_map["avi"] = "video/avi";
    mime_map["mpg"] = "video/mpeg";

4) Then compile xapian-omega as you normally would.

Hope this helps someone else.

Brian

On Tue, Nov 30, 2010 at 1:29 PM, Tom <tom at lemurconsulting.com> wrote:

> Hi Brian,
>
> This is certainly possible, just by generating terms from the
> filename. But are you talking about writing an app from scratch, or
> adding this to an existing one? (I'm not sure but I think omega might
> already support this).
>
> Tom
>
> On 30 November 2010 12:01, Brian Burton <dynamoeffects at gmail.com> wrote:
> > I have scoured the documentation and mailing list but can't find what
> seems
> > to be an obvious question.
> >
> > I would simply like to index binary files (images, zips, dwg files) so
> that
> > they are included in the search results.  They wouldn't need to be parsed
> > for content, only have the filenames searchable.
> >
> > Is this possible with Xapian?  If not has anyone come up with an
> alternate
> > strategy?
> > _______________________________________________
> > Xapian-discuss mailing list
> > Xapian-discuss at lists.xapian.org
> > http://lists.xapian.org/mailman/listinfo/xapian-discuss
> >
>


More information about the Xapian-discuss mailing list