[Xapian-tickets] [Xapian] #346: Python 3 support
Xapian
nobody at xapian.org
Tue Jun 26 20:04:41 BST 2012
#346: Python 3 support
--------------------------------------+-------------------------------------
Reporter: olly | Owner: richard
Type: defect | Status: assigned
Priority: highest | Milestone: 1.3.2
Component: Xapian-bindings (Python) | Version: SVN trunk
Severity: normal | Keywords:
Blockedby: | Platform: All
Blocking: |
--------------------------------------+-------------------------------------
Comment(by barry):
Replying to [comment:34 james]:
> However returning bytes consistently has exactly the same problem and
using different functions to get the behaviour most people will expect
seems icky to me; the most sensible solution I've been able to come up
with is that you could set an encoding at the library level, with None
meaning use bytes. It's unclean conceptually but means you can achieve
whatever you need with mostly little fuss; if you need different encodings
with different databases for instance you'd have to manage everything
manually anyway because so many things aren't associated with a Database
object (Query, for instance). Doubly so if you want Document data to be
UTF 8 but terms to be raw bytes.
>
> By default I'd favour no encoding, which is the behaviour Olly is
describing. For the purposes of getting Python 3 support for Ubuntu 12.10
that seems reasonable and would be forward compatible if (hopefully when)
we implement output encoding in the python layer ourselves.
>
> This leaves user implemented functions and the like. For these I'd
definitely pass byte always, as they are more complex and rare anyway. If
we documented a way of getting the user-set encoding from the wrapper
people could write directors that took advantage of that if they wanted to
make their code reusable.
I'm no fan of global magical state, but here's one way it *could* work.
You could expose context managers which control the global state, one for
input and one for output. This might actually be easier if your choices
are utf-8 or bytes (though other encodings could probably be supported),
and if you make a default consistent choice, e.g. of always accepting and
returning bytes. I honestly don't know if in practice this would make
your life easier, but it would work something like this:
value = blah.get_value()
assert isinstance(value, bytes)
with xapian.as_utf_8():
value = blah.get_value()
assert isinstance(value, str)
Maybe you don't need one for input and you can auto-detect and auto-
convert. The details of course would have to be worked out, but it would
be possible to nest these for different input and output types, and you'd
probably want a xapian.as_bytes() context manager to temporarily switch
back, along with introspection methods, and all implemented as a stack of
contexts. I've done something very similar with my flufl.i18n package
which has to manage a global stack of locale states. It was a bit tricky
to get the API right, but now it's quite useful and easily explained.
Again, not saying it *should* work this way, just that it *could* :)
--
Ticket URL: <http://trac.xapian.org/ticket/346#comment:39>
Xapian <http://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list