[Xapian-discuss] Re: Is utf-8 enabled on version 0.99?

Olly Betts olly at survex.com
Mon Dec 18 00:43:58 GMT 2006


On Sun, Dec 17, 2006 at 11:05:49PM +0800, Andrey Kong wrote:
> You guys talking about the utf8 support of xapian
> i wonder whats the difference btw with / without utf8 support?
> 
> i m using 0.9.9 now for Chinese data (in UTF8) and it seems working
> perfectly, so this means i already have utf8 in my xapian OR it will run
> even better after plugged UTF8 support for xapian?

Most of the core Xapian library don't care about what character encoding
is in use - they just treat strings as opaque blobs of data and it's
always been perfectly possible to use UTF-8 if you wish.

There are a couple of parts which do care however.  One is the stemming
algorithms (Xapian::Stem class) and the other is the queryparser
(Xapian::QueryParser class).

We don't have a stemming algorithm for Chinese (it's not an inflected
language, so the concept doesn't really apply), so if you're doing
your own query parsing (or if you don't have user entered queries to
parse) then it won't make any difference to you.

Omega needs to worry more about character encodings, so that's actually
where most of the UTF-8 work has been.

Cheers,
    Olly



More information about the Xapian-discuss mailing list