[Xapian-discuss] Indexing Chinese

epaulin epaulin at gmail.com
Thu Jun 29 17:09:06 BST 2006

Previous message: [Xapian-discuss] Indexing Chinese
Next message: [Xapian-discuss] Remote databases slower than local?
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 6/28/06, Alex Deucher <alexdeucher at gmail.com> wrote:
> Has anyone ever indexed documents of Chinese characters?  What's the
> best way to break down the text for indexing.
>

The most common way to do Chinese word segmentation is called "Maximum
Matching", take a look at this:

http://acl.ldc.upenn.edu/C/C96/C96-1035.pdf

Previous message: [Xapian-discuss] Indexing Chinese
Next message: [Xapian-discuss] Remote databases slower than local?
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Xapian-discuss mailing list