[Xapian-devel] Moving indextext.cc into core.

James Aylett james-xapian at tartarus.org
Thu Mar 29 13:01:56 BST 2007


On Wed, Mar 28, 2007 at 06:45:30PM +0100, Olly Betts wrote:

> But now isn't a good time to be making such major changes anyway.  We
> need to focus on releasing 1.0, not destabilising SVN HEAD!

I have an image of SVN HEAD as a weeble...

> For example, the issue of character encoding in the Python (and other) 
> bindings needs resolving.  Your Python knowledge is better than mine,
> but I have a feeling that Python uses wide characters for unicode
> internally, so we probably want to perform conversion to/from utf-8 when
> calling Xapian - that will make it hard to put binary data in terms,
> values, document data, etc but getting Unicode string handled is more
> important for most users I believe.  I'm hoping the conversions can be
> achieved with suitable typemaps.

Python has two distinct types of strings: simple and Unicode. Unicode
strings are stored UCS-2 or UCS-4 internally (this is compile-time
dependent; see sys.maxunicode to figure out what you've got; I have no
idea what most distros do), simple are 8-bit and are *usually*
considered in the relevant encoding for the LOCALE (but this is up to
the program in question; strictly, if the underlying system is ASCII,
then 7-bit will be ASCII, but there's a rule about EBCDIC for more
exotic systems that is irrelevant to most people). 8-bit strings are
fine for binary data, so the problem is what happens when you try to
(say) add_posting(u'Unicode string',...)  -- a typemap here should be
able to simply .encode('utf-8') with no undue side effects. (The
trouble with typemaps being, they are in C not Python. So it's
actually an internal routing rather than the wrapped version, but
that's not a big deal: PyUnicode_Encode() I think does the trick.

J

-- 
/--------------------------------------------------------------------------\
  James Aylett                                                  xapian.org
  james at tartarus.org                               uncertaintydivision.org



More information about the Xapian-devel mailing list