[Xapian-tickets] [Xapian] #346: Python 3 support

Xapian nobody at xapian.org
Tue Jun 26 20:04:41 BST 2012


#346: Python 3 support
--------------------------------------+-------------------------------------
 Reporter:  olly                      |       Owner:  richard  
     Type:  defect                    |      Status:  assigned 
 Priority:  highest                   |   Milestone:  1.3.2    
Component:  Xapian-bindings (Python)  |     Version:  SVN trunk
 Severity:  normal                    |    Keywords:           
Blockedby:                            |    Platform:  All      
 Blocking:                            |  
--------------------------------------+-------------------------------------

Comment(by barry):

 Replying to [comment:34 james]:

 > However returning bytes consistently has exactly the same problem and
 using different functions to get the behaviour most people will expect
 seems icky to me; the most sensible solution I've been able to come up
 with is that you could set an encoding at the library level, with None
 meaning use bytes. It's unclean conceptually but means you can achieve
 whatever you need with mostly little fuss; if you need different encodings
 with different databases for instance you'd have to manage everything
 manually anyway because so many things aren't associated with a Database
 object (Query, for instance). Doubly so if you want Document data to be
 UTF 8 but terms to be raw bytes.
 >
 > By default I'd favour no encoding, which is the behaviour Olly is
 describing. For the purposes of getting Python 3 support for Ubuntu 12.10
 that seems reasonable and would be forward compatible if (hopefully when)
 we implement output encoding in the python layer ourselves.
 >
 > This leaves user implemented functions and the like. For these I'd
 definitely pass byte always, as they are more complex and rare anyway. If
 we documented a way of getting the user-set encoding from the wrapper
 people could write directors that took advantage of that if they wanted to
 make their code reusable.

 I'm no fan of global magical state, but here's one way it *could* work.
 You could expose context managers which control the global state, one for
 input and one for output.  This might actually be easier if your choices
 are utf-8 or bytes (though other encodings could probably be supported),
 and if you make a default consistent choice, e.g. of always accepting and
 returning bytes.  I honestly don't know if in practice this would make
 your life easier, but it would work something like this:

 value = blah.get_value()
 assert isinstance(value, bytes)

 with xapian.as_utf_8():
     value = blah.get_value()
 assert isinstance(value, str)

 Maybe you don't need one for input and you can auto-detect and auto-
 convert.  The details of course would have to be worked out, but it would
 be possible to nest these for different input and output types, and you'd
 probably want a xapian.as_bytes() context manager to temporarily switch
 back, along with introspection methods, and all implemented as a stack of
 contexts.  I've done something very similar with my flufl.i18n package
 which has to manage a global stack of locale states.  It was a bit tricky
 to get the API right, but now it's quite useful and easily explained.

 Again, not saying it *should* work this way, just that it *could* :)

-- 
Ticket URL: <http://trac.xapian.org/ticket/346#comment:39>
Xapian <http://xapian.org/>
Xapian



More information about the Xapian-tickets mailing list