[Xapian-discuss] Mime2Text library, derived from omindex

Liam xapian at networkimprov.net
Fri Nov 11 08:55:27 GMT 2011


On Thu, Nov 10, 2011 at 9:19 PM, Olly Betts <olly at survex.com> wrote:

> On Wed, Nov 09, 2011 at 12:22:22PM -0800, Liam wrote:
> > I've started a branch for a patch. Would love any feedback
> >
> > Header
> >
> https://github.com/networkimprov/xapian/blob/liam_mime2text-lib/xapian-applications/omega/mime2text.h
>
> Some thoughts from looking at just the header:
>
> I don't like that this API forces you to have a file on disk.  If you're
> unpacking a container format you may well have data in memory, and while
> it might be necessary to write it to disk to pass it to some extractors,
> we really want to avoid that for extractors with a library API, or where
> we can feed data to an external filter on stdin.
>

The existing code expects a filename, so I feel we should stick with that
for the first version of this, although I agree it should take a stream
type eventually.

The ctor should be marked explicit, since there's a single argument form
> of it; we don't want to quietly promote from bool to Mime2Text.
>
> Needs documentation comments.
>

Roger those two.

Why does the public API here take const char * parameters not const
> std::string&?  If we really want const char * internally and the overhead
> of a conversion to std::string matters for the case where the user has
> a const char * to start with, we should provide overloaded forms which
> call c_str() for the user.
>

I'm already converting to std::string for the filename internally. I do
treat the type argument as a char* and don't convert it directly. Can you
look over the opening lines of Mime2Text::convert and advise me how to
proceed?

Why is "command" in Mime2Text::Fields?  It doesn't seem to be a field.
>

It's for informational purposes, what external command produced these
results, if any.

We don't expose members of a class/struct elsewhere in the public C++
> APIs we provide, so for consistency perhaps we shouldn't here.  Though
> this presumably isn't intended to go in the core API, so perhaps that
> isn't such a concern.  It does mean these values can't be lazily
> computed though (for example, the sample probably could be in some cases
> otherwise).
>

Happy to provide accessor methods.

For consistency with conventions in the other public C++ APIs (and the
> whole of the rest of the code) don't use camel case for method names or
> parameters (setCommand, setMimeType, iKey, iValue, outFields), only for
> class and struct names.
>

oops, right

Did you intend to use "protected" rather than "private" for the internal
> stuff?  That means that users can subclass to get access to it, and ties
> down the details of that...
>

I was thinking at first to make it derivable, but now think we ought to
enable that only if it becomes necessary.

There's no way to remove an entry from the command or mine maps with
> the current API (you could create your own subclass of Mime2Text to do
> this, but that's a bit mad, and I'm unconvinced those maps should be
> protected rather than private anyway).
>

You can set an entry to "". I don't think you should be able to delete it
outright.


More information about the Xapian-discuss mailing list