[Xapian-discuss] UTF-8 Corruption
Colin Bell
colinabell at gmail.com
Thu Mar 20 14:24:52 GMT 2008
Thanks James
It's looking good
http://lxr.mozilla.org/mozilla/source/intl/chardet/
On 20 Mar 2008, at 14:11, James Aylett wrote:
> On Thu, Mar 20, 2008 at 02:08:00PM +0000, Colin Bell wrote:
>
>>> There are ways to detect the character set of a file, though not
>>> always 100% reliably.
>>
>> Can anyone recommend some c++ code to do this?
>
> I assume, but don't know, that the Firefox/Mozilla ``magic'' charset
> detector is in C or C++ (the one that Mark Pilgrim ported to Python).
>
> J
>
> --
> /--------------------------------------------------------------------------\
> James Aylett
> xapian.org
> james at tartarus.org
> uncertaintydivision.org
>
> _______________________________________________
> Xapian-discuss mailing list
> Xapian-discuss at lists.xapian.org
> http://lists.xapian.org/mailman/listinfo/xapian-discuss
More information about the Xapian-discuss
mailing list