[Xapian-discuss] std::string arguments presumed to be UTF8?
James Aylett
james-xapian at tartarus.org
Mon Nov 14 12:20:11 GMT 2011
On 14 Nov 2011, at 11:54, Liam wrote:
> I see that TermGenerator::index_text() can take a Utf8Iterator argument,
> but Document::add_term() etc simply take a std::string.
>
> Are std::string arguments presumed to be UTF8 strings? If "sometimes,"
> where or where not?
I believe the situation is as follows:
* std::string should never be presumed to be UTF8. Terms, for instance, are just treated internally as byte arrays (but are commonly used to store strings, hence using std::string for convenience in C++).
* The TermGenerator, and a few other pieces of Xapian, *do* act on UTF8, since they operate at a level that is dealing with actual characters, so there has to be a defined encoding.
Unfortunately, this isn't terribly clear from the documentation.
J
--
James Aylett
talktorex.co.uk - xapian.org - devfort.com
More information about the Xapian-discuss
mailing list