[Xapian-tickets] [Xapian] #346: Python 3 support
Xapian
nobody at xapian.org
Sun Jun 17 08:49:36 BST 2012
#346: Python 3 support
--------------------------------------+-------------------------------------
Reporter: olly | Owner: richard
Type: defect | Status: assigned
Priority: highest | Milestone: 1.3.2
Component: Xapian-bindings (Python) | Version: SVN trunk
Severity: normal | Keywords:
Blockedby: | Platform: All
Blocking: |
--------------------------------------+-------------------------------------
Comment(by olly):
> The easiest and cleanest way to do this is to avoid doing any magic
conversions, use the bytes type in all places where binary strings may be
used (both in parameters and in return values), and use the unicode string
type in any places where only encoded strings may be used.
I can see the appeal, but it's really not easiest, at least not to
implement, since it requires specifying different std::string input
typemaps for the two cases, whereas if we always accept either Python
string type for any use of std::string, then a single std::string input
typemap can do that. It'll mean wrapping new methods for Python which
accept std::string becomes a chore too, rather than just happening
automatically.
> all methods which it can ever make sense to pass arbitrary binary
strings to should accept only bytes.
I'm not sure it's quite so clear cut though - for example, Xapian::Stem
seems "obviously UTF-8", but as I mentioned above, you could create a user
stemmer which expects non-UTF-8 data, and then suddenly passing binary
data to it makes some sense. This example is arguably a bit far-fetched,
but the worrying part to me is that the status of Xapian::Stem potentially
changed as the result of an addition to the C++ API (the user-defined
stemmer feature).
And you still end up with the situation where there API isn't consistent,
because some places where you want to pass Unicode take only Unicode
strings, but other places where you want to pass Unicode insist on bytes,
so you need to know which wants which, which makes the API a lot harder to
learn and use.
Really, the only consistent thing to do would be only converting bytes to
std::string and back, but I don't see that as a great approach. It's
really just consistently unfriendly.
> No magic "return unicode if the arguments passed to the call were
unicode" or similar hackery, because this just makes the API harder to
document and understand, and leads to subtle and hard to track down bugs.
I wasn't proposing anything of the sort. Pretty much all I'm suggesting
is that we accept Unicode or bytes for std::string on input, and return
bytes on output, with an alternative helper method which does the
conversion for you.
If we force everyone to have to learn which string type they have to pass
where, and write explicit conversions all over the shop, all I can really
see us achieving is a proliferation in the number of self-proclaimed
"nicer" Python interfaces to Xapian.
--
Ticket URL: <http://trac.xapian.org/ticket/346#comment:33>
Xapian <http://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list