[Xapian-discuss] Encoding Oddities

Johannes Fahrenkrug jfahrenkrug at gmail.com
Mon Jan 24 22:07:14 GMT 2011


Hi,

I'm new to the list but I've been using Xapian along with the Ruby
bindings and Xapit for over 1,5 years and it's working great. But now
I've run into a very strange encoding issue.

I'm using Xapian 1.0.11 on Solaris.

This is the issue: I'm pulling ISO-8859-15 encoded data from a legacy
database and I'm indexing it. Some of that data contains German Umlaut
characters. When I search for those words, Xapian finds nothing. That
should not surprise me since the docs say that Xapian expects UTF-8
encoded strings. So I use Iconv to convert the strings from
ISO-8859-15 to UTF-8 before I pass it to Xapian to be indexed: It
still doesn't work. The weird thing is, however, that when I just put
a UTF-8 string literal into my ruby code and return it in place of the
actual string that should be indexed, it works. Even with Umlauts. So
a UTF-8 String LITERAL works, but a UTF-8 String that has been
converted from ISO-8859-15 does not.

Does this sound familiar to anyone? Any help would be appreciated!

- Johannes

-- 
springenwerk.com | github.com/jfahrenkrug | twitter.com/jfahrenkrug



More information about the Xapian-discuss mailing list