[Xapian-discuss] Encoding Oddities

Adam Sjøgren asjo at koldfront.dk
Tue Jan 25 11:48:38 GMT 2011


On Mon, 24 Jan 2011 17:50:59 -0800, Johannes wrote:

> After some more digging it seems to have to do with capital Umlauts.
> So when I index the UTF-8 String "ägypten", I can search for "ägypten"
> and for "Ägypten" and get results for both searches.

> But when I index the UTF-8 string "Ägypten", I don't get any results,
> whether I search for "ägypten" or for "Ägypten".

> Is that a bug or am I missing something?

It sounds like the Ä isn't being lowercased as it should at indexing
time, so the Ä gets included in the "prefix" of the term rather than the
"content" (so the index gets something like ZÄgypt instead of Zägypt,
and you'll get a hit if you search for "gypten").

How is Ruby's support for lowercasing of utf-8 chars?


  Just a guess,

    Adam

-- 
 "Här kommer rädslan, gamle vän                               Adam Sjøgren
  När alla fjärilar i magen vaknar upp                   asjo at koldfront.dk
  Viskar välkommen hem"




More information about the Xapian-discuss mailing list