[Xapian-discuss] TermGenerator question for the single quote character

tata 668 tata668 at gmail.com
Mon Apr 6 18:09:19 BST 2009


I tried it and can confirm that setting it to "french" doesn't help. 
"m'excite" is still indexed as "m'excite" and not as "m" and "excite".

If someone has an idea on how it could be fixed, it would be really 
appreciated!

Thank you,

Julien



tata 668 wrote:
> I found it: 
> http://xapian.org/docs/apidoc/html/classXapian_1_1TermGenerator.html#f7d43aef10aa6b26ef853a0ae2695f83
>
> I'll try to set it to the french stremmer..
>
> Thanks
>
> Julien
>
>
>
> Olly Betts wrote:
>   
>> On Sun, Apr 05, 2009 at 07:18:08PM -0400, tata 668 wrote:
>>   
>>     
>>> I use the TermGenerator to index the french text "Cela m'excite" 
>>> (without the quotes). When I do a search for "excite" after this 
>>> indexation, I need it to be found. "excite" is a word on is own.
>>>
>>> Currently "excite" is not found but "m'excite" is...
>>>     
>>>       
>> In 1.0.0, we changed to treating apostrophes as part of a word, and
>> updated to a newer version of Snowball where the English stemmer
>> deals with them.
>>
>> I think the correct way for this to work is for the other stemmers
>> to also handle apostrophes (at least if their languages use them)
>> as otherwise the word tokenisation required depends on the stemmer.
>>
>>   
>>     
>>> Is there a setting I'm missing so that the single quote character act as 
>>> a word delimiter?
>>>     
>>>       
>> No, there's no such setting currently.
>>
>> Cheers,
>>     Olly
>>
>>   
>>     
> _______________________________________________
> Xapian-discuss mailing list
> Xapian-discuss at lists.xapian.org
> http://lists.xapian.org/mailman/listinfo/xapian-discuss
>
>   


More information about the Xapian-discuss mailing list