[Xapian-discuss] UTF-8: what is done and what is not?

Olly Betts olly at survex.com
Fri Nov 3 00:49:08 GMT 2006


On Thu, Nov 02, 2006 at 08:46:36AM -0500, tata 668 wrote:
> I'm aware of the UTF-8 branch here: 
> http://www.oligarchy.co.uk/xapian/branches/utf8/ , but I'd like more 
> information about what it contains and if it's enough for me.

The current status is summarised here:

http://wiki.xapian.org/Utf8Support

I'm in the process of turning the release handle for 0.9.8 (to fix
various minor problems reported since 0.9.7), so I'm very close to
merging the utf-8 branch in and the rate of visible progress should pick
up.

> Currently, I wrote my own word spliter to index the data and my own 
> queryparser. They are not perfect and I would like to use built-in Xapian 
> objects instead.

There's not currently a word splitter in the core library, but
Xapian::QueryParser now works in utf-8 on the branch, so you can
probably use that now.

I've not tested utf-8 from any of the bindings yet.  Some languages
standardise on a particular internal representation, so there could
be issues here (I don't know how PHP handles such issues).  But I'd
certainly encourage you to try it and let us know if it works or if
there are problems.

Cheers,
    Olly



More information about the Xapian-discuss mailing list