[Xapian-discuss] Encoding Oddities
Adam Sjøgren
asjo at koldfront.dk
Tue Jan 25 11:48:38 GMT 2011
On Mon, 24 Jan 2011 17:50:59 -0800, Johannes wrote:
> After some more digging it seems to have to do with capital Umlauts.
> So when I index the UTF-8 String "ägypten", I can search for "ägypten"
> and for "Ägypten" and get results for both searches.
> But when I index the UTF-8 string "Ägypten", I don't get any results,
> whether I search for "ägypten" or for "Ägypten".
> Is that a bug or am I missing something?
It sounds like the Ä isn't being lowercased as it should at indexing
time, so the Ä gets included in the "prefix" of the term rather than the
"content" (so the index gets something like ZÄgypt instead of Zägypt,
and you'll get a hit if you search for "gypten").
How is Ruby's support for lowercasing of utf-8 chars?
Just a guess,
Adam
--
"Här kommer rädslan, gamle vän Adam Sjøgren
När alla fjärilar i magen vaknar upp asjo at koldfront.dk
Viskar välkommen hem"
More information about the Xapian-discuss
mailing list