[Xapian-discuss] indexing words with alternative spellings
Michel Pelletier
pelletier.michel at gmail.com
Tue May 11 17:00:40 BST 2010
Different languages have different libraries for dealing with this
issue. We use one for Python called 'translitcodec' which can do both
long (ä -> ae) and short (ä -> a) conversion. It's very likely there
is a similar library for whatever language you are using.
http://pypi.python.org/pypi/translitcodec/0.1
-Mike
On Tue, May 11, 2010 at 6:18 AM, Per Jessen <per at computer.org> wrote:
> Some languages (e.g. German and Danish) have special letters that are
> often written using two-letter combinations when the appropriate
> keyboard or medium is not available:
>
> ä = ae
> ü = ue
> ö = oe
> æ = ae
> ø = oe
> å = aa
> ß = ss
>
> (there are undoubtedly far more examples than those)
>
> As a user of an index, I would like to be able to search for
> e.g. "schaefer" and get matches on both 'ae' and 'ä' returned. Same if
> I searched on 'schäfer'. Is this something I would need to take into
> account when I do the indexing or?
>
>
> /Per Jessen, Zürich
>
>
> _______________________________________________
> Xapian-discuss mailing list
> Xapian-discuss at lists.xapian.org
> http://lists.xapian.org/mailman/listinfo/xapian-discuss
>
More information about the Xapian-discuss
mailing list