[Xapian-discuss] UTF8 support plans (without stemming)

Olly Betts olly at survex.com
Wed Jun 29 04:19:57 BST 2005


On Thu, Apr 28, 2005 at 01:37:02PM +0100, Olly Betts wrote:
> On Thu, Apr 28, 2005 at 11:08:28AM +0400, Alexandre wrote:
> > it's very hard to make it work with other languages (for example, with
> > russian) - there are lots of problems inside...
> 
> The problems aren't anywhere near as great as you seem to expect, at
> least in part because unicode support has always been a goal we've kept
> in mind.

And here's a search which works in Russian (or Chinese or Elvish or ...)
implemented using Xapian:

http://rain.gmane.org/?query=%D0%B2%D0%BE%D0%B7%D0%BC%D0%BE%D0%B6%D0%BD%D0%BE

This is using a patched version of the QueryParser.  Currently I'm using
glib's unicode routines, but I wonder if we really want to add a
dependency on glib when we only use a very tiny part of it.

I already have C code for handling utf-8.  I'm going to see what else is
around for unicode versions of "isalpha" etc.

In the meantime, if anyone is interested in my somewhat hacked up patch
to give a UTF8 savvy QueryParser, let me know and I'll send you a copy.

Cheers,
    Olly



More information about the Xapian-discuss mailing list