[Xapian-discuss] std::string arguments presumed to be UTF8?

Liam xapian at networkimprov.net
Mon Nov 14 18:00:01 GMT 2011


On Mon, Nov 14, 2011 at 4:20 AM, James Aylett <james-xapian at tartarus.org>wrote:

>   * std::string should never be presumed to be UTF8. Terms, for instance,
> are just treated internally as byte arrays (but are commonly used to store
> strings, hence using std::string for convenience in C++).
>
>  * The TermGenerator, and a few other pieces of Xapian, *do* act on UTF8,
> since they operate at a level that is dealing with actual characters, so
> there has to be a defined encoding.
>

So does the TermGenerator interpret a std::string argument as utf8?

If so, either that should be documented, or it should take a std::wstring.

If not, omindex's mime-type conversion is to utf8, I believe, but it
doesn't use Utf8Iterator when calling TermGenerator::index_text()...


More information about the Xapian-discuss mailing list