[Xapian-discuss] UTF-8 becomes glibberish in searches
olly at survex.com
Wed Oct 24 16:16:19 BST 2007
On Wed, Oct 24, 2007 at 02:38:51PM +0200, robert wrote:
> In your .script have you unhtml for this field ??
> in myhtmlparse.cc => Line 42 : charset = "ISO-8859-1";
> all datas from this fieldd was converted in ISO-8859-1
The original poster has now solved their problem - they'd failed to
specify a character set by sending an HTTP "Content-Type" header from
their PHP script.
The line you indicate is just the default character set if none is
otherwise specified by an HTML document.
Files with XML declarations default to utf-8, or use the encoding
specified there, e.g.:
<?xml version="1.0" encoding="UTF-8"?>
And we also honour "meta http-equiv", e.g.:
<meta http-equiv="Content-Type" content="UTF-8">
More information about the Xapian-discuss