On 6/28/06, Alex Deucher <alexdeucher at gmail.com> wrote: > Has anyone ever indexed documents of Chinese characters? What's the > best way to break down the text for indexing. > The most common way to do Chinese word segmentation is called "Maximum Matching", take a look at this: http://acl.ldc.upenn.edu/C/C96/C96-1035.pdf