[Xapian-tickets] [Xapian] #346: Python 3 support
Xapian
nobody at xapian.org
Sun Jun 17 14:03:15 BST 2012
#346: Python 3 support
--------------------------------------+-------------------------------------
Reporter: olly | Owner: richard
Type: defect | Status: assigned
Priority: highest | Milestone: 1.3.2
Component: Xapian-bindings (Python) | Version: SVN trunk
Severity: normal | Keywords:
Blockedby: | Platform: All
Blocking: |
--------------------------------------+-------------------------------------
Comment(by olly):
A global "encoding switch" doesn't really seem workable to me.
For starters, Xapian::sortable_unserialise() '''definitely''' needs to be
passed bytes (it takes a binary string as returned by
Xapian::sortable_serialise(), so passing a Unicode string converted to
UTF-8 makes no sense at all, and having to convert your binary input to
Unicode to pass it isn't going to work well), so there will need to be
exceptions which the switch doesn't affect, or else some settings of the
switch will render such functions unusable.
The problem with "manage everything manually" when a single setting of the
encoding isn't enough is that the case of wanting both text and binary
data isn't at all esoteric. For example, look at omindex, which indexes
text but adds a (binary) document checksum as a value for collapsing
identical documents. Alternatively, you could change the global encoding
each time you want to pass the other sort, but then you're flipping it
back and forth for every document you index.
And if you want to write something reusable you'll find yourself having to
save the encoding state, and then set it to what you want, do your calls
to Xapian, and then restore the encoding state. That really seems worse
than having to specify the encoding at every call site.
In terms of text encodings, xapian-core only really supports UTF-8. In a
lot of places, you just get back what you put in, but anything that
actually looks at the contents as text expects UTF-8. So the only
settings of the "encoding" switch which make sense on the C++ side are
UTF-8 and binary data.
In response to Barry:
>> I guess if you're trying to get everyone onto Python 3 for Ubuntu,
you've looked at quite a few upstreams already - has a standard pattern
for resolving such situations already emerged?
>> Well, the only upstream I currently have to support is software-center,
since we're only converting to Python 3 on the standard desktop image (for
12.10 anyway). So its use case will be my primary driver. We have maybe a
dozen reverse depends on python-xapian in total.
I meant other upstream Python projects you're needing to get on to Python
3, not reverse dependencies of python-xapian (though getting the reverse
dependencies ported to "python3-xapian" could take significant work,
especially if we totally throw out compatibility with the current python-
xapian API). I was wondering if there was a standard pattern for handling
wrapping an interface like this.
> One big question is this: what version of Python 2 do you still need to
support (please tell me, nothing earlier than 2.6 :), and how should we
handle cases where the API has to change for Python 3?
Adding Python 3 support really shouldn't change which Python 2 versions
are supported. If the changes are invasive enough that this is really an
issue, then I suggest we split the Python 3 bindings into a separate
subdirectory - we already have different versions of all the tests, though
currently they're mostly the result of 2to3, plus a few tweaks.
Especially for 1.2.x, we really don't want to be risking breaking Python 2
support - our general policy is not to break compatibility with a version
of other software within a Xapian stable release series without a very
good reason.
We've not made a final decision on the versions of things we'll aim to
support in 1.4.x - 2.6 may well be a sane cut-off there, but that doesn't
really help since you want this support in 1.2.x.
As for the minimum 3.x to support, I certainly wouldn't worry about 3.0 -
my impression is that the early adopters who actually tried it will have
quickly moved on to the new cutting edge, while the conservative types
will have feared the ".0".
--
Ticket URL: <http://trac.xapian.org/ticket/346#comment:35>
Xapian <http://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list