[Xapian-discuss] Chinese, Japanese, Korean Tokenizer.

Kevin Duraj kevin.softdev at gmail.com
Tue Jun 5 22:37:27 BST 2007


Hi,

I am looking for Chinese Japanese and Korean tokenizer that could can
be use to tokenize terms for CJK languages. I am not very familiar
with these languages however I think that these languages contains one
or more words in one symbol which it make more difficult to tokenize
into searchable terms.

Lucene has CJK Tokenizer ... and I am looking around if there is some
open source that we could use with Xapian.

http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/org/apache/lucene/analysis/cjk/package-summary.html

Cheers
  -Kevin Duraj



More information about the Xapian-discuss mailing list