[Xapian-discuss] Chinese segmentation

戴优丽 daiyli1984 at gmail.com
Thu Apr 21 08:49:38 BST 2011

hello, I have finished reading the papers, and i think it is time to design
my project.
First step will be determine the input characters are Chinese. i see the
past post that cjk-tokenizer is just dealing with UTF-8 and unicode, but i
see some other code system such as gbk and big5. i am wondering that should
i just deal with UTF-8 and unicode?

More information about the Xapian-discuss mailing list