[Xapian-discuss] Python bindings and unicode strings
Richard Boulton
richard at lemurconsulting.com
Tue Sep 4 19:04:54 BST 2007
Deron Meranda wrote:
> If the core really does just deal with binary data, then it is probably
> best to just stick to "str" for current Pythons, and change to "bytes"
> for Python 3000. From there an application can, if wanted, use
> the decode() method to get unicode out. Plus that gives the caller
> more flexibility with dealing with possible decoding errors.
I agree.
> I think the most immediate need is just to properly document the
> current behavior. And really the only "surprise" in the way it works
> now is that it will silently convert unicode into UTF-8-encoded str
> objects rather than raising an error. And just a little bit of
> documentation can smooth that out.
But it _is_ documented clearly, or so I thought. In
python/docs/bindings.html, section "Unicode". Perhaps the problem is
that we need to make that file more visible (but then, that problem
applies to most of our documentation, sadly).
> Having to call encode() or decode() methods is not really that
> large of a burden that we need to hide it with magic, as long as it
> is clear to the user that they need to do so if they want to work
> with unicode and not binary blobs.
Indeed.
--
Richard
More information about the Xapian-discuss
mailing list