[Xapian-discuss] std::string arguments presumed to be UTF8?
Olly Betts
olly at survex.com
Tue Nov 15 02:20:32 GMT 2011
On Mon, Nov 14, 2011 at 12:20:11PM +0000, James Aylett wrote:
> * std::string should never be presumed to be UTF8. Terms, for
> instance, are just treated internally as byte arrays (but are
> commonly used to store strings, hence using std::string for
> convenience in C++).
>
> * The TermGenerator, and a few other pieces of Xapian, *do* act on
> UTF8, since they operate at a level that is dealing with actual
> characters, so there has to be a defined encoding.
Yes, that's spot on - if the we need to look at characters, then the
encoding matters and should be UTF-8. Otherwise you can put any byte
sequences in the input.
> Unfortunately, this isn't terribly clear from the documentation.
There's already a note about improving that at:
http://trac.xapian.org/wiki/MissingDocumentation
(It may appear that list just gets longer, but we are addressing things
from it, just more get added too...)
Cheers,
Olly
More information about the Xapian-discuss
mailing list