[Xapian-discuss] Encoding Oddities
Olly Betts
olly at survex.com
Sat Jan 29 12:17:50 GMT 2011
On Tue, Jan 25, 2011 at 04:49:56PM -0800, Johannes Fahrenkrug wrote:
> Ruby's UTF-8 "support" is a joke in 1.8.x. But that was exactly the
> problem. The "downcase" method of Ruby's String class didn't downcase
> UTF-8 characters. There are two ways to get around it: If you're using
> Rails, use "a string".mb_chars.downcase. Otherwise, require the
> "unicode" gem and use Unicode::downcase("a string").
I'm not familiar with xapit, but at the Xapian API level you should be
able to feed UTF-8 text to Xapian::TermGenerator for indexing, and
UTF-8 query strings to Xapian::QueryParser for parsing, and case folding
will be done for you for any characters which have a lowercase
equivalent in the Unicode tables.
So I guess you or xapit aren't using TermGenerator and QueryParser (or
are only using one of them)? If you are, it sounds like a Xapian bug,
but not one I can reproduce.
Cheers,
Olly
More information about the Xapian-discuss
mailing list