[Xapian-tickets] [Xapian] #346: Python 3 support

Xapian nobody at xapian.org
Fri Jun 15 02:12:39 BST 2012


#346: Python 3 support
--------------------------------------+-------------------------------------
 Reporter:  olly                      |       Owner:  richard  
     Type:  defect                    |      Status:  assigned 
 Priority:  highest                   |   Milestone:  1.3.2    
Component:  Xapian-bindings (Python)  |     Version:  SVN trunk
 Severity:  normal                    |    Keywords:           
Blockedby:                            |    Platform:  All      
 Blocking:                            |  
--------------------------------------+-------------------------------------

Comment(by olly):

 There are certainly some useful changes in Sabrina's patch, which we
 hopefully should get in for the 1.3.2 development snapshot, such as the
 PEP3147 support (though that should use {{{imp.get_tag()}}} rather than
 hard-coding cpython-32) and probably the {{{__next__}}} rename (though
 I've not looked at the reasons for that).  However, I'm afraid it doesn't
 address the major remaining issue (Unicode strings) in the right way.  If
 we had a patch which fixed all the remaining issues, we would have applied
 it by now!  The patch changes the testcases to match what the output it
 produces, so passing the tests is largely meaningless.

 The fundamental issue (as mentioned in the bug description) is that the
 Xapian C++ API uses std::string as both a UTF-8 string and a byte string.
 Some methods will only ever return one of these - e.g.
 Xapian::sortable_serialise() always returns a byte string, while
 Xapian::Stem::operator() always returns a UTF-8 string (well, unless you
 create a user stemming algorithm which doesn't...), but some can return
 either, generally depending what you stored earlier (e.g.
 Xapian::Document::get_value()).  Similarly, some methods which take
 strings can take only one sort, or either, but in this case we can just
 handle whichever we are passed when the C++ API accepts a std::string.
 The key difference is that for a return value, we have to pick a Python
 type to return.

 So to fix this, for each API method which returns std::string we need to
 decide whether it returns Unicode, bytes, or both.  If it's both, the best
 solution is probably to add a second form (e.g.
 xapian.Document.get_value_unicode()) which does the conversion for the
 user, rather than forcing them to sprinkle explicit conversions around
 calls to xapian.Document.get_value() in their code.  SWIG's %extend makes
 this pretty easy to do.

 Or perhaps the standard should be for get_value() to return Unicode with a
 get_value_bytes() or get_value_raw() alternative.  Or perhaps what we do
 should depend on how the method will usually be used (e.g. terms can be
 arbitrary binary strings, but in practice they're almost always UTF-8).

 I guess if you're trying to get everyone onto Python 3 for Ubuntu, you've
 looked at quite a few upstreams already - has a standard pattern for
 resolving such situations already emerged?

 One further complication may be the user sub-classable API classes (which
 SWIG calls "directors").  Here C++ calls back to Python, so it's the
 arguments rather than the return types which matter.  I'm not sure if
 there are any cases there which could take either Unicode or bytes, but if
 there are, I think we probably have to always pass bytes and let the
 Python subclass explicitly convert if it wants to.

 It looks like the feature freeze date for 12.10 is 23rd August, which is
 only just over 2 months away - if you want to see Python 3 support in a
 stable Xapian release by then, realistically you're going to have to be
 the one to actually make that happen.  As things are currently, it's not
 looking at all likely it would even be fixed on trunk by then.  It would
 certainly be good to sort out Python 3 support, but there's not yet much
 evidence of actual user demand, and Richard was the main one driving this,
 but isn't very active in Xapian development right now.

-- 
Ticket URL: <http://trac.xapian.org/ticket/346#comment:28>
Xapian <http://xapian.org/>
Xapian



More information about the Xapian-tickets mailing list