[Xapian-discuss] Indexing Chinese

epaulin epaulin at gmail.com
Thu Jun 29 17:09:06 BST 2006


On 6/28/06, Alex Deucher <alexdeucher at gmail.com> wrote:
> Has anyone ever indexed documents of Chinese characters?  What's the
> best way to break down the text for indexing.
>

The most common way to do Chinese word segmentation is called "Maximum
Matching", take a look at this:

http://acl.ldc.upenn.edu/C/C96/C96-1035.pdf



More information about the Xapian-discuss mailing list