[Xapian-discuss] patch proposal: omindex library or daemon

Richard Boulton richard at tartarus.org
Mon Oct 24 12:22:44 BST 2011


On 24 October 2011 11:39, Liam <xapian at networkimprov.net> wrote:
> Yes, there we go. Also needs arguments for parse options and (optional)
> mime-type.

True; the function I suggested would probably be better as a method of
a DocumentParser class (or some better name), which allowed settings
like the mime-mappings to be supplied, and could also keep some state
(eg, I think ifilter type stuff on windows works best if you maintain
a persistent connection to the filters - my memory may be inaccurate,
but it seems likely that some filters could benefit from some
persistent state being kept and reused for subsequent parse
operations, so the API should allow that to be implemented).

> There's a second routine which does the default Document ops for values &
> data:
>
>  void Document::set_values_and_data(const std::map<std::string,
> std::string>& fields, const std::vector<std::string>& omit_fields=0);
>  // omit_fields is a list of field names to omit from Document values
>  // might live in class MimeDocument : public Document

I'm not so convinced by this; and it's certainly not something that I
think is needed to make a useful library around omindex.  Given the
text data from the fields, it's very easy to use TermGenerator to
index the content, or to call your own routines.

I'm not at all convinced there's a good case for having a MimeDocument
class (or at least, not as a subclass of Document), but I'm also not
sure what you're thinking its use is.  A DocumentFields class of some
kind to help manage document data as suggested in ticket #53 could be
a useful addition to core Xapian, but I don't think that's quite what
you're thinking of (and anyway, our most recent thinking (from 4 years
ago, ahem!) is that this might be best done as methods of
Xapian::Document).

To summarise; getting the data out of arbitrary documents in a set of
fields seems like a good aim for a library.  Hardcoding some default
indexing behaviours for that data seems like feature creep,
particularly since I usually find myself wanting custom behaviours.

-- 
Richard



More information about the Xapian-discuss mailing list