[Xapian-discuss] UTF-8 Corruption

Colin Bell colinabell at gmail.com
Thu Mar 20 14:24:52 GMT 2008


Thanks James

It's looking good

http://lxr.mozilla.org/mozilla/source/intl/chardet/


On 20 Mar 2008, at 14:11, James Aylett wrote:

> On Thu, Mar 20, 2008 at 02:08:00PM +0000, Colin Bell wrote:
>
>>> There are ways to detect the character set of a file, though not
>>> always 100% reliably.
>>
>> Can anyone recommend some c++ code to do this?
>
> I assume, but don't know, that the Firefox/Mozilla ``magic'' charset
> detector is in C or C++ (the one that Mark Pilgrim ported to Python).
>
> J
>
> --  
> /--------------------------------------------------------------------------\
>  James Aylett                                                   
> xapian.org
>  james at tartarus.org                                
> uncertaintydivision.org
>
> _______________________________________________
> Xapian-discuss mailing list
> Xapian-discuss at lists.xapian.org
> http://lists.xapian.org/mailman/listinfo/xapian-discuss




More information about the Xapian-discuss mailing list