[Xapian-discuss] chinese/japanese index support

Jean-Francois Dockes jean-francois.dockes at wanadoo.fr
Thu Mar 6 08:16:36 GMT 2008


Fabrice Colin writes:
 > Yung-chung Lin wrote a CJKV n-gram tokenizer. The source is here :
 > http://svn.berlios.de/wsvn/dijon/trunk/cjkv/?rev=0&sc=1
 > It's not tied to Xapian in particular. It needs libunicode 0.4 or glib.
 > 
 > I make use of it in Pinot, to generate terms when indexing CJKV documents,
 > and at search time to pre-process CJKV queries before feeding them to the
 > QueryParser.

Just for the record, Recoll also has limited ngram-based CJK support, not
based on Yung-Chung Lin's code (which was the initial inspiration).

It's relatively primitive, but the few users tell me that it is at least
better than nothing and "good enough" in many cases.

J.F. Dockes



More information about the Xapian-discuss mailing list