[Xapian-tickets] [Xapian] #346: Python 3 support

Xapian nobody at xapian.org
Fri Sep 20 10:07:49 BST 2013

#346: Python 3 support
 Reporter:  olly                      |             Owner:  richard
     Type:  defect                    |            Status:  assigned
 Priority:  highest                   |         Milestone:  1.3.2
Component:  Xapian-bindings (Python)  |           Version:  SVN trunk
 Severity:  normal                    |        Resolution:
 Keywords:                            |        Blocked By:
 Blocking:                            |  Operating System:  All

Comment (by richard):

 I'm fairly strongly of the opinion that xapian methods which can accept
 arbitrary bytes should only accept bytes, and return byte arrays; this
 makes it clear what's going on, and avoids confusion when a value which is
 added as a unicode string comes back from Xapian as bytes.  In python2,
 this change of type would tend not to be noticed, because almost
 everything accepted str or unicode.  Or at least, it wouldn't be noticed
 until something far downstream of the conversion from unicode to bytes
 fell over on a rare piece of unicode data.

 In python3, most things are much more fussy, and won't accept both bytes
 and str values.  Therefore, it's more important to be explicit when values
 are changing type.

 I'm thinking in particular about things like Xapian::Document, where it's
 not unreasonable for a programmer to expect that if a term is added to a
 document, and then that term is read out of the document again, the
 returned term will be the same as the added term.

 I agree with the concerns about this leading to lots of different "thin"
 wrappers being created, and therefore think that having variants which
 accept or return (unicode) strings (and error if a return value cannot be
 converted to unicode) would be a good idea; the defaults should be to be
 strict about types though.  The variants would just save some typing, but
 still make the programmer be explicit about what they're passing in.

 I suggest the suffix "_str" for the unicode versions; in Python 3, "str"
 is a unicode string, and "_str" is reasonably short.

 There are two objects which should accept unicode strings as some of their
 parameters - the query parser and term generator.  These shouldn't have
 "bytes" variants; they always interpret the data as a string, and a python
 3 programmer should be working with str objects if the data is a string.

 query_parser.set_prefix is slightly awkward; the prefix typed by the user
 should be a "str", but the prefix applied internally to terms should be a
 "bytes".  I'm not sure if a .set_prefix_str() variant, in which the
 internal prefix can be specified as a str, makes sense.

Ticket URL: <http://trac.xapian.org/ticket/346#comment:55>
Xapian <http://xapian.org/>

More information about the Xapian-tickets mailing list