[Xapian-discuss] Encoding Oddities

Johannes Fahrenkrug jfahrenkrug at gmail.com
Tue Jan 25 01:50:59 GMT 2011


Hi,

After some more digging it seems to have to do with capital Umlauts.
So when I index the UTF-8 String "ägypten", I can search for "ägypten"
and for "Ägypten" and get results for both searches.

But when I index the UTF-8 string "Ägypten", I don't get any results,
whether I search for "ägypten" or for "Ägypten".

Is that a bug or am I missing something?

Cheers,

Johannes

On Mon, Jan 24, 2011 at 2:07 PM, Johannes Fahrenkrug
<jfahrenkrug at gmail.com> wrote:
> Hi,
>
> I'm new to the list but I've been using Xapian along with the Ruby
> bindings and Xapit for over 1,5 years and it's working great. But now
> I've run into a very strange encoding issue.
>
> I'm using Xapian 1.0.11 on Solaris.
>
> This is the issue: I'm pulling ISO-8859-15 encoded data from a legacy
> database and I'm indexing it. Some of that data contains German Umlaut
> characters. When I search for those words, Xapian finds nothing. That
> should not surprise me since the docs say that Xapian expects UTF-8
> encoded strings. So I use Iconv to convert the strings from
> ISO-8859-15 to UTF-8 before I pass it to Xapian to be indexed: It
> still doesn't work. The weird thing is, however, that when I just put
> a UTF-8 string literal into my ruby code and return it in place of the
> actual string that should be indexed, it works. Even with Umlauts. So
> a UTF-8 String LITERAL works, but a UTF-8 String that has been
> converted from ISO-8859-15 does not.
>
> Does this sound familiar to anyone? Any help would be appreciated!
>
> - Johannes
>
> --
> springenwerk.com | github.com/jfahrenkrug | twitter.com/jfahrenkrug
>



-- 
springenwerk.com | github.com/jfahrenkrug | twitter.com/jfahrenkrug



More information about the Xapian-discuss mailing list