[Xapian-discuss] UTF8 support plans (without stemming)
Alexandre
Xlex0x835 at rambler.ru
Thu Apr 28 08:08:28 BST 2005
On Apr 28, 2005, at 00:17, rm at fabula.de wrote:
>> If so, "query parser ... currently assume latin1" - that's not very
>> good, isn't it?
>
> Hmm. Depends on what you want/need to do. I personally can't see why
> there
> even _is_ a query parser in Xapian core. After all the query language
> really
> depends on the aplication ...
To be honest I didn't dig inside library, I just believe in bug
report... =)
Anyway, usually, when application/library was developed to support only
one language (american/english) it's very hard to make it work with
other languages (for example, with russian) - there are lots of
problems inside...
>> Hm, and can you tell me, please, more about stemming influence on IR
>> in
>> western languages? Is it only about probabilistic IR or about vector
>> search too?
>>
>> And another one question (not exactly about subject): why Xapian stick
>> to the probabilistic approach? Probably some historical links/docs?
>
> Well, these two querstions relate to each other: Xapian is strong in
> 'probabilistic IR' and that approach kind of needs some sort of
> stemming.
> I can't speak for the Xapian developers (nor the libraries ancestry
> in the guts of Muscat) - from your question i infer that you seem to
> think
> that 'probabilistic IR' is kind of outdated?
I'm not a an expert, to have any moral rights to say, that I strongly
believe, that 'probabilistic IR' is kind of outdated.
I just suppose, that computer can work well with lots of data, while
human brain can make some sort of decisions. No, I'm not for boolean
search, but I just didn't like probabilistic approach too much (when
machine tries to be smart)... I can (and probably is) absolutely wrong,
that's why I interested why people choose such approach.
Regards,
/Alexandre.
More information about the Xapian-discuss
mailing list