[Xapian-discuss] using xapian for indexing mails [SOLVED]

Olly Betts olly at survex.com
Tue Sep 2 04:37:45 BST 2008


On Mon, Sep 01, 2008 at 03:57:29AM -0600, Rusty Conover wrote:
> > I noticed that the stemming is language-specific (understandably); is
> > there some recommended way to guess the language of a blob of text?  
> 
> n-gram analysis works pretty well..
[...]
> See:
> http://www.rubyinside.com/whatlanguage-ruby-language-detection-library-1085.html
> http://code.activestate.com/recipes/326576/

Also:

http://odur.let.rug.nl/~vannoord/TextCat/

Cheers,
    Olly



More information about the Xapian-discuss mailing list