[Xapian-discuss] UTF8 support plans (without stemming)

Alexandre Xlex0x835 at rambler.ru
Wed Apr 27 21:09:26 BST 2005


On Apr 27, 2005, at 23:47, rm at fabula.de wrote:

> On Wed, Apr 27, 2005 at 11:32:30PM +0400, Alexandre wrote:
>> Good day,
>>
>> does there is any plans about support of the UTF-8 (I talk about lib
>> core, not about stemming)?
>
> What exactly do you mean by UTF-8 support? You can pretty much stuff
> anything into a xapian database (see some recent posts in this list).
> But -- without stemming statistical information retieval doesn't really
> work as expected in most western languages :-/

Ralf, do you mean this post  
(http://lists.tartarus.org/pipermail/xapian-discuss/2005-April/ 
000821.html)?

If so, "query parser ... currently assume latin1" - that's not very  
good, isn't it?

Hm, and can you tell me, please, more about stemming influence on IR in  
western languages? Is it only about probabilistic IR or about vector  
search too?

And another one question (not exactly about subject): why Xapian stick  
to the probabilistic approach? Probably some historical links/docs?

Thank you in advance,
Regards,
/Alexandre.





More information about the Xapian-discuss mailing list