[Xapian-discuss] Chinese segmentation

戴优丽 daiyli1984 at gmail.com
Thu Apr 21 08:49:38 BST 2011

Previous message: [Xapian-discuss] Merge databases
Next message: [Xapian-discuss] Chinese segmentation
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

hello, I have finished reading the papers, and i think it is time to design
my project.
First step will be determine the input characters are Chinese. i see the
past post that cjk-tokenizer is just dealing with UTF-8 and unicode, but i
see some other code system such as gbk and big5. i am wondering that should
i just deal with UTF-8 and unicode?

Previous message: [Xapian-discuss] Merge databases
Next message: [Xapian-discuss] Chinese segmentation
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Xapian-discuss mailing list