[Xapian-discuss] UTF-8 Corruption

James Aylett james-xapian at tartarus.org
Thu Mar 20 14:11:06 GMT 2008


On Thu, Mar 20, 2008 at 02:08:00PM +0000, Colin Bell wrote:

> > There are ways to detect the character set of a file, though not  
> > always 100% reliably.
> 
> Can anyone recommend some c++ code to do this?

I assume, but don't know, that the Firefox/Mozilla ``magic'' charset
detector is in C or C++ (the one that Mark Pilgrim ported to Python).

J

-- 
/--------------------------------------------------------------------------\
  James Aylett                                                  xapian.org
  james at tartarus.org                               uncertaintydivision.org



More information about the Xapian-discuss mailing list