[Xapian-discuss] indexing words with alternative spellings

Olly Betts olly at survex.com
Thu May 13 03:06:44 BST 2010


On Tue, May 11, 2010 at 03:18:38PM +0200, Per Jessen wrote:
> Some languages (e.g. German and Danish) have special letters that are
> often written using two-letter combinations when the appropriate
> keyboard or medium is not available:

For German, you can use the "german2" stemmer which transliterates as
you describe.

There's also unac for more general accent normalisation:

http://www.nongnu.org/unac/

There's actually a version 1.8.0 not mentioned there (but Debian has it).
Not sure what's up, but the upstream page at http://www.senga.org/unac/ is no
longer there.

Cheers,
    Olly



More information about the Xapian-discuss mailing list