[Xapian-discuss] Python bindings and unicode strings

Richard Boulton richard at lemurconsulting.com
Tue Sep 4 19:04:54 BST 2007


Deron Meranda wrote:
> If the core really does just deal with binary data, then it is probably
> best to just stick to "str" for current Pythons, and change to "bytes"
> for Python 3000.  From there an application can, if wanted, use
> the decode() method to get unicode out.  Plus that gives the caller
> more flexibility with dealing with possible decoding errors.

I agree.

> I think the most immediate need is just to properly document the
> current behavior.  And really the only "surprise" in the way it works
> now is that it will silently convert unicode into UTF-8-encoded str
> objects rather than raising an error.  And just a little bit of
> documentation can smooth that out.

But it _is_ documented clearly, or so I thought.  In 
python/docs/bindings.html, section "Unicode".  Perhaps the problem is 
that we need to make that file more visible (but then, that problem 
applies to most of our documentation, sadly).

> Having to call encode() or decode() methods is not really that
> large of a burden that we need to hide it with magic, as long as it
> is clear to the user that they need to do so if they want to work
> with unicode and not binary blobs.

Indeed.

-- 
Richard



More information about the Xapian-discuss mailing list