[Xapian-discuss] indexing words with alternative spellings

Michel Pelletier pelletier.michel at gmail.com
Tue May 11 17:00:40 BST 2010


Different languages have different libraries for dealing with this
issue.  We use one for Python called 'translitcodec' which can do both
long (ä -> ae) and short (ä -> a) conversion.  It's very likely there
is a similar library for whatever language you are using.

http://pypi.python.org/pypi/translitcodec/0.1

-Mike

On Tue, May 11, 2010 at 6:18 AM, Per Jessen <per at computer.org> wrote:
> Some languages (e.g. German and Danish) have special letters that are
> often written using two-letter combinations when the appropriate
> keyboard or medium is not available:
>
> ä = ae
> ü = ue
> ö = oe
> æ = ae
> ø = oe
> å = aa
> ß = ss
>
> (there are undoubtedly far more examples than those)
>
> As a user of an index, I would like to be able to search for
> e.g. "schaefer" and get matches on both 'ae' and 'ä' returned. Same if
> I searched on 'schäfer'.  Is this something I would need to take into
> account when I do the indexing or?
>
>
> /Per Jessen, Zürich
>
>
> _______________________________________________
> Xapian-discuss mailing list
> Xapian-discuss at lists.xapian.org
> http://lists.xapian.org/mailman/listinfo/xapian-discuss
>



More information about the Xapian-discuss mailing list