[Xapian-discuss] patch proposal: omindex library or daemon

Liam xapian at networkimprov.net
Sat Nov 5 07:43:45 GMT 2011


On Wed, Nov 2, 2011 at 7:21 PM, Liam <xapian at networkimprov.net> wrote:

> On Wed, Nov 2, 2011 at 6:18 PM, Olly Betts <olly at survex.com> wrote:
>
>>
>> But overall, I think it's probably simpler if you just work on whatever
>> you want to achieve, and we can decide what to do about it once there's
>> actually something to look at.  A long email discussion about what the
>> API(s) should be is all very lovely, but a (mostly) working
>> implementation of them gives much better insights into what actually
>> works.
>>
>
> Thanks for your feedback. I'm afraid I clouded the issues somewhat because
> I was coming up to speed on how everything works; apologies.
>
> All I need is to copy/paste the mime-file conversion logic into a
> separate, non-shared library, with a single API call. This is the code in
> index_mimetype() that takes a filename and produces a set of plain-text
> strings (author, title, sample, keywords, dump, md5), plus the mime_map
> defined in main(). Also, the makefile entry for this library would
> reference the external sources it depends on.
>
> Enabling zip-file unpacking in the mime converter in future would be
> great; I don't think this change would conflict with that. For a group of
> documents in an archive file, the library function could return a table of
> results instead of a single set of strings.
>
> I will have some questions as I'm working on this... I'd like guidance on
> how to port DirectoryIterator::file_to_string() to the new library, since
> it wouldn't use DirectoryIterator.
>
> Assuming that sounds rational to you, I'll get started!
>

I've pushed a working draft of this to my Xapian fork on GitHub. What do
you think?

'mime2text first draft' in
https://github.com/networkimprov/xapian/commits/liam_mime2text-lib

In a separate commit, I took the liberty of cleaning up two stray 'using
std::...' statements in htmlparse.h and fixing its dependents.


More information about the Xapian-discuss mailing list