[Xapian-discuss] UTF-8 becomes glibberish in searches

robert robert at weborama.fr
Wed Oct 24 13:38:51 BST 2007


Olly Betts a écrit :
> On Thu, Oct 18, 2007 at 12:47:21PM -0700, athlon athlonf wrote:
>   
>> I'm using dbi2omega and scriptindex to index a database with chinese
>>  characters.
>> Searches are done with php4-bindings.
>>
>> While the index-file is in utf8, the results from the searches are
>>  glibberish.
>>
>> These characters (changed to htmlencoding for this message)
>> ?????? becomes something like this: å??äº???ä¸
>>     
>
> I just see "?" and inverse "?" here in mutt I'm afraid...
>
>   
>> What am I doing wrong here? Is it the indexing, or is it the searching?
>>     
>
> You need to step through the process, checking that everything is OK
> after each step.  It could be dbi2omega is wrong, or scriptindex, or
> xapian itself, or the PHP bindings.
>
> First of all, I'd run dbi2omega redirected to a file, and then see if
> the UTF-8 is correct in that file.
>
>   
>>  How can I check if the database is indeed in utf-8?
>>     
>
> Use the "delve" utility (in xapian-core, examples/delve) to look at the
> terms for a few documents.
>
> If both dbi2omega and the database look OK, then it's probably the PHP
> bindings.  If you're writing the results as a web page, have you set
> the character set of the webpage to UTF-8 correctly?  Check what your
> web browser says its character set is.
>
> Cheers,
>     Olly
>   
In your .script have you unhtml  for this field ??
    in myhtmlparse.cc => Line 42 : charset = "ISO-8859-1";
    all datas from this fieldd  was converted in  ISO-8859-1
> ( excuse my poor english i'm French ) 
> _______________________________________________
> Xapian-discuss mailing list
> Xapian-discuss at lists.xapian.org
> http://lists.xapian.org/mailman/listinfo/xapian-discuss
>   




More information about the Xapian-discuss mailing list