[Xapian-tickets] [Xapian] #346: Python 3 support
Xapian
nobody at xapian.org
Fri Sep 20 10:07:49 BST 2013
#346: Python 3 support
--------------------------------------+------------------------------
Reporter: olly | Owner: richard
Type: defect | Status: assigned
Priority: highest | Milestone: 1.3.2
Component: Xapian-bindings (Python) | Version: SVN trunk
Severity: normal | Resolution:
Keywords: | Blocked By:
Blocking: | Operating System: All
--------------------------------------+------------------------------
\
\
\
\
\
\
Comment (by richard):
I'm fairly strongly of the opinion that xapian methods which can accept
arbitrary bytes should only accept bytes, and return byte arrays; this
makes it clear what's going on, and avoids confusion when a value which is
added as a unicode string comes back from Xapian as bytes. In python2,
this change of type would tend not to be noticed, because almost
everything accepted str or unicode. Or at least, it wouldn't be noticed
until something far downstream of the conversion from unicode to bytes
fell over on a rare piece of unicode data.
In python3, most things are much more fussy, and won't accept both bytes
and str values. Therefore, it's more important to be explicit when values
are changing type.
I'm thinking in particular about things like Xapian::Document, where it's
not unreasonable for a programmer to expect that if a term is added to a
document, and then that term is read out of the document again, the
returned term will be the same as the added term.
I agree with the concerns about this leading to lots of different "thin"
wrappers being created, and therefore think that having variants which
accept or return (unicode) strings (and error if a return value cannot be
converted to unicode) would be a good idea; the defaults should be to be
strict about types though. The variants would just save some typing, but
still make the programmer be explicit about what they're passing in.
I suggest the suffix "_str" for the unicode versions; in Python 3, "str"
is a unicode string, and "_str" is reasonably short.
There are two objects which should accept unicode strings as some of their
parameters - the query parser and term generator. These shouldn't have
"bytes" variants; they always interpret the data as a string, and a python
3 programmer should be working with str objects if the data is a string.
query_parser.set_prefix is slightly awkward; the prefix typed by the user
should be a "str", but the prefix applied internally to terms should be a
"bytes". I'm not sure if a .set_prefix_str() variant, in which the
internal prefix can be specified as a str, makes sense.
\
\
\
--
Ticket URL: <http://trac.xapian.org/ticket/346#comment:55>
Xapian <http://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list