[Xapian-discuss] Indexing Thai (was Re: Indexing Chinese)
Olly Betts
olly at survex.com
Mon Jul 10 23:38:59 BST 2006
On Fri, Jun 30, 2006 at 12:09:06AM +0800, epaulin wrote:
> The most common way to do Chinese word segmentation is called "Maximum
> Matching", take a look at this:
>
> http://acl.ldc.upenn.edu/C/C96/C96-1035.pdf
Tangentially related, but I happened upon this Thai word segmentation
code (GPL licensed):
http://www.cs.cmu.edu/~paisarn/software.html
I've not tried it (or even downloaded it) - just sharing the URL as it
may be of interest to Xapian users.
Cheers,
Olly
More information about the Xapian-discuss
mailing list