[Xapian-discuss] omega and "text/x-mail" support

Emmanuel Garette egarette at cadoles.com
Tue Dec 16 21:04:47 GMT 2014


Le 15/12/2014 23:22, Olly Betts a écrit :
> On Sat, Dec 13, 2014 at 08:32:58PM +0100, Emmanuel Garette wrote:
>> I would like to add "text/x-mail" support to omega. I'm using mhonarc to
>> export mail to HTML format and I'm using HTML parse to index mail
>> content (largely inspired by "application/vnd.ms-outlook" format).
>>
>> The problem is that files attached to the mail are not indexing at all.
>> I think it's not possible in "index_file" function to index 2 files as
>> one document.
>>
>> I can't find easily solution for my problem. I think I must spit this
>> function to separate document's creation and file indexing.
> I've done some work on indexing attachments and files inside archives
> (like tar and zip files), but I haven't merged it yet as it's not
> entirely satisfactory in various ways, most of which require some
> refactoring of omindex to address.
>
> The approach I took to attachments was to index them as separate
> documents - if I follow you correctly, you seem to be trying to treat
> them as part of a single document.  Is there a particular reason why
> you are taking that approach?
>
> I don't think my code is anywhere public currently, but I can rebase
> it onto current master and put it on a git branch if it's potentially
> useful to others in its current form.
In my opinion, one file is a document. But maybe I'm wrong.
The problem is that we cannot construct path (prefixed by U) in this case.
How deal with path if an email could generate more than one document?
Something like "U/path/to/mail|Attached.pdf"? Or we could add a new prefix?

I'm interesting by your work on indexing archives to understand how you
extect to build path.

Regards,
>
> Cheers,
>     Olly


-- 
Emmanuel Garette
Ingénieur logiciels libres

Cadoles (http://www.cadoles.com)
Experts EOLE, Gaspacho, logiciels libres


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <http://lists.xapian.org/pipermail/xapian-discuss/attachments/20141216/fdbd22d6/attachment.sig>


More information about the Xapian-discuss mailing list