[Xapian-discuss] chinese/japanese index support

Rick Olson rick at napalmriot.com
Tue Feb 26 09:27:36 GMT 2008


chun yu wrote:
> Hi, all
>  
> I am studying Xapain project.
>  
> I am wandered if the version 1.0.5 has support the chinese/japanese indexing.
>  
> If so, could you please tell me the code in the project to implement it?
>  
> or how can I implement to support indexing chinese?
>  
>  
> Thanks a lot!
>   
Hello,

For indexing of Chinese/Japanese/Korean data, I have to suggest a 
product called Senna (http://qwik.jp/senna/).  It is also free as in 
free (if that's what floats your boat), but is not Xapian specifically.  
I haven't yet successfully used Xapian for indexing any character from 
the CJK set in a production environment, but from my experience so far 
it's not so convenient to use it for such a thing (no stemming support 
that I can see, and significance of spaces in many cases!).

Technically, however, the indexing for 90% of the  cases I've tested 
with which include Asian character support have _functioned_, just not 
catered to that use is all.

Perhaps one of the core developers can shed some light on this 
situation, but I do believe I am correct with my personal tests.  There 
are other solutions as well which cater to Asian character sets.

Regards,
Rick



More information about the Xapian-discuss mailing list