[Xapian-discuss] patch proposal: omindex library or daemon

Liam xapian at networkimprov.net
Wed Oct 26 00:42:31 BST 2011


On Mon, Oct 24, 2011 at 11:10 AM, Liam <xapian at networkimprov.net> wrote:

> On Mon, Oct 24, 2011 at 10:17 AM, James Aylett <james-xapian at tartarus.org>wrote:
>
>> On 24 Oct 2011, at 05:10, Liam wrote:
>>
>> >>> void Document::set_values_and_data(const std::map<std::string,
>> >>> std::string>& fields, const std::vector<std::string>& omit_fields=0);
>> >>> // omit_fields is a list of field names to omit from Document values
>> >>> // might live in class MimeDocument : public Document
>> >>
>> >> I'm not so convinced by this; and it's certainly not something that I
>> >> think is needed to make a useful library around omindex.  Given the
>> >> text data from the fields, it's very easy to use TermGenerator to
>> >> index the content, or to call your own routines.
>> >
>> > Use TermGenerator? Wouldn't the user typically call
>> Document::set_data()?
>> > Forgive my inexperience…
>>
>>
>> Document::set_data has nothing to do with terms or values (which are used
>> for searching); its typical use is as a place to store information about the
>> document that you'd use after having retrieved it from an MSet. So you might
>> put a sample there, or a pre-rendered HTML preview blob, or (as omega does)
>> a number of pieces of information that can be used to create a preview on
>> the fly.
>>
>
> Ah, ok. Looking at that part of omindex, I think you'd certainly pull that
> code into a separate source file. I can see it needn't be a library
> component, but those of us with custom indexing logic would start with that
> code and adjust as required.
>
> So who should initiate work on this patch? (2 new src files, one of which
> generates a static lib...)
>

OK, I'm gonna start on this unless someone quickly warns me to keep my rogue
paws off the defenseless code :-)

Plan is for 4 new source files:

omindex_tree.cc -- the directory/file iterator functions, providing results
to an ostream
mime2text.cc -- most of the index_mimetype() function, providing results in
a std::map<std::string, std::string>
omindex_db.cc -- end of the index_mimetype() function
omindex.h -- headers for the above

Changes:

omindex.cc -- retain options processing, and not much else
Makefile -- build a static library from mime2text.cc and its dependencies

Is it OK to submit my work via pull-request to the Github repository?


More information about the Xapian-discuss mailing list