[Xapian-discuss] Trouble with German language indexing/searching

Jim Lynch jim at fayettedigital.com
Wed Feb 15 21:30:05 GMT 2006


Hi Olly,

I'd appreciate a copy of that patch.  If I could I'd reverse the 
transliteration but it looks to be one way only.

Thanks,
Jim.

Olly Betts wrote:

>On Wed, Feb 15, 2006 at 11:51:29AM -0500, Jim Lynch wrote:
>  
>
>>OK, not entirely.  When I search for für using Omega, the term that gets 
>>returned in the resultant xml is
>><queryterm term="fuer" show="fuer" freq="17"/>
>>
>>I'm using a simple script to generate contextual samples and obviously 
>>it doesn't work.  So where do I go to tell Xapian that I've got an 
>>extended character set?
>>    
>>
>
>Currently the QueryParser performs transliteration of accented
>characters (assuming character set iso-8859-1), and this is done 
>even when stemming is disabled.  In this case, "u-umlaut" is converted
>to "ue".
>
>This has been discussed before a few times, for example:
>
>http://thread.gmane.org/gmane.comp.search.xapian.general/1815
>
>I'm planning to revisit this area before 1.0.  I suspect that I'll
>remove the transliteration, and any that makes sense to keep will
>be pushed into the stemmers (since it's a form of normalisation)
>
>Meanwhile, it's not hard to disable if you're happy to run a patched
>version of xapian (I thought I'd sent such a patch to the list but I
>can't find it right now).
>
>Cheers,
>    Olly
>
>
>
>  
>




More information about the Xapian-discuss mailing list