[Xapian-discuss] using xapian for indexing mails [SOLVED]
Olly Betts
olly at survex.com
Tue Sep 2 04:37:45 BST 2008
On Mon, Sep 01, 2008 at 03:57:29AM -0600, Rusty Conover wrote:
> > I noticed that the stemming is language-specific (understandably); is
> > there some recommended way to guess the language of a blob of text?
>
> n-gram analysis works pretty well..
[...]
> See:
> http://www.rubyinside.com/whatlanguage-ruby-language-detection-library-1085.html
> http://code.activestate.com/recipes/326576/
Also:
http://odur.let.rug.nl/~vannoord/TextCat/
Cheers,
Olly
More information about the Xapian-discuss
mailing list