[Xapian-tickets] [Xapian] #346: Python 3 support

Xapian nobody at xapian.org
Sun Jun 17 08:49:36 BST 2012


#346: Python 3 support
--------------------------------------+-------------------------------------
 Reporter:  olly                      |       Owner:  richard  
     Type:  defect                    |      Status:  assigned 
 Priority:  highest                   |   Milestone:  1.3.2    
Component:  Xapian-bindings (Python)  |     Version:  SVN trunk
 Severity:  normal                    |    Keywords:           
Blockedby:                            |    Platform:  All      
 Blocking:                            |  
--------------------------------------+-------------------------------------

Comment(by olly):

 > The easiest and cleanest way to do this is to avoid doing any magic
 conversions, use the bytes type in all places where binary strings may be
 used (both in parameters and in return values), and use the unicode string
 type in any places where only encoded strings may be used.

 I can see the appeal, but it's really not easiest, at least not to
 implement, since it requires specifying different std::string input
 typemaps for the two cases, whereas if we always accept either Python
 string type for any use of std::string, then a single std::string input
 typemap can do that.  It'll mean wrapping new methods for Python which
 accept std::string becomes a chore too, rather than just happening
 automatically.

 > all methods which it can ever make sense to pass arbitrary binary
 strings to should accept only bytes.

 I'm not sure it's quite so clear cut though - for example, Xapian::Stem
 seems "obviously UTF-8", but as I mentioned above, you could create a user
 stemmer which expects non-UTF-8 data, and then suddenly passing binary
 data to it makes some sense.  This example is arguably a bit far-fetched,
 but the worrying part to me is that the status of Xapian::Stem potentially
 changed as the result of an addition to the C++ API (the user-defined
 stemmer feature).

 And you still end up with the situation where there API isn't consistent,
 because some places where you want to pass Unicode take only Unicode
 strings, but other places where you want to pass Unicode insist on bytes,
 so you need to know which wants which, which makes the API a lot harder to
 learn and use.

 Really, the only consistent thing to do would be only converting bytes to
 std::string and back, but I don't see that as a great approach.  It's
 really just consistently unfriendly.

 > No magic "return unicode if the arguments passed to the call were
 unicode" or similar hackery, because this just makes the API harder to
 document and understand, and leads to subtle and hard to track down bugs.

 I wasn't proposing anything of the sort.  Pretty much all I'm suggesting
 is that we accept Unicode or bytes for std::string on input, and return
 bytes on output, with an alternative helper method which does the
 conversion for you.

 If we force everyone to have to learn which string type they have to pass
 where, and write explicit conversions all over the shop, all I can really
 see us achieving is a proliferation in the number of self-proclaimed
 "nicer" Python interfaces to Xapian.

-- 
Ticket URL: <http://trac.xapian.org/ticket/346#comment:33>
Xapian <http://xapian.org/>
Xapian



More information about the Xapian-tickets mailing list