[Xapian-discuss] Indexing Thai (was Re: Indexing Chinese)

Olly Betts olly at survex.com
Mon Jul 10 23:38:59 BST 2006

On Fri, Jun 30, 2006 at 12:09:06AM +0800, epaulin wrote:
> The most common way to do Chinese word segmentation is called "Maximum
> Matching", take a look at this:
> http://acl.ldc.upenn.edu/C/C96/C96-1035.pdf

Tangentially related, but I happened upon this Thai word segmentation
code (GPL licensed):


I've not tried it (or even downloaded it) - just sharing the URL as it
may be of interest to Xapian users.


More information about the Xapian-discuss mailing list