[Xapian-discuss] chinese/japanese index support
Jean-Francois Dockes
jean-francois.dockes at wanadoo.fr
Thu Mar 6 08:16:36 GMT 2008
Fabrice Colin writes:
> Yung-chung Lin wrote a CJKV n-gram tokenizer. The source is here :
> http://svn.berlios.de/wsvn/dijon/trunk/cjkv/?rev=0&sc=1
> It's not tied to Xapian in particular. It needs libunicode 0.4 or glib.
>
> I make use of it in Pinot, to generate terms when indexing CJKV documents,
> and at search time to pre-process CJKV queries before feeding them to the
> QueryParser.
Just for the record, Recoll also has limited ngram-based CJK support, not
based on Yung-Chung Lin's code (which was the initial inspiration).
It's relatively primitive, but the few users tell me that it is at least
better than nothing and "good enough" in many cases.
J.F. Dockes
More information about the Xapian-discuss
mailing list