[Xapian-discuss] chinese/japanese index support
Rick Olson
rick at napalmriot.com
Tue Feb 26 09:27:36 GMT 2008
chun yu wrote:
> Hi, all
>
> I am studying Xapain project.
>
> I am wandered if the version 1.0.5 has support the chinese/japanese indexing.
>
> If so, could you please tell me the code in the project to implement it?
>
> or how can I implement to support indexing chinese?
>
>
> Thanks a lot!
>
Hello,
For indexing of Chinese/Japanese/Korean data, I have to suggest a
product called Senna (http://qwik.jp/senna/). It is also free as in
free (if that's what floats your boat), but is not Xapian specifically.
I haven't yet successfully used Xapian for indexing any character from
the CJK set in a production environment, but from my experience so far
it's not so convenient to use it for such a thing (no stemming support
that I can see, and significance of spaces in many cases!).
Technically, however, the indexing for 90% of the cases I've tested
with which include Asian character support have _functioned_, just not
catered to that use is all.
Perhaps one of the core developers can shed some light on this
situation, but I do believe I am correct with my personal tests. There
are other solutions as well which cater to Asian character sets.
Regards,
Rick
More information about the Xapian-discuss
mailing list