[Xapian-discuss] std::string arguments presumed to be UTF8?
Liam
xapian at networkimprov.net
Mon Nov 14 18:00:01 GMT 2011
On Mon, Nov 14, 2011 at 4:20 AM, James Aylett <james-xapian at tartarus.org>wrote:
> * std::string should never be presumed to be UTF8. Terms, for instance,
> are just treated internally as byte arrays (but are commonly used to store
> strings, hence using std::string for convenience in C++).
>
> * The TermGenerator, and a few other pieces of Xapian, *do* act on UTF8,
> since they operate at a level that is dealing with actual characters, so
> there has to be a defined encoding.
>
So does the TermGenerator interpret a std::string argument as utf8?
If so, either that should be documented, or it should take a std::wstring.
If not, omindex's mime-type conversion is to utf8, I believe, but it
doesn't use Utf8Iterator when calling TermGenerator::index_text()...
More information about the Xapian-discuss
mailing list