[Xapian-discuss] UTF-8 becomes glibberish in searches

athlon athlonf athlonkmf at yahoo.com
Thu Oct 18 20:47:21 BST 2007


I'm using dbi2omega and scriptindex to index a database with chinese
 characters.
Searches are done with php4-bindings.

While the index-file is in utf8, the results from the searches are
 glibberish.

These characters (changed to htmlencoding for this message)
同事 becomes something like this: åŒäº‹ä¸


What am I doing wrong here? Is it the indexing, or is it the searching?
 How can I check if the database is indeed in utf-8?

I'm using a fresh install of ubuntu and therefor a fresch version 1.0.2
 of xapian.

This is part of the code I use to get the results

// Start an enquire session.
$enquire = new XapianEnquire($database);

$query_string = $_POST['terms'];

$qp = new XapianQueryParser();
$stemmer = new XapianStem("english");
$qp->set_stemmer($stemmer);
$qp->set_database($database);
$qp->set_stemming_strategy(XapianQueryParser_STEM_SOME);
$qp->add_valuerangeprocessor( new XapianDateValueRangeProcessor(1) );
$qp->set_default_op( OP_AND );


$query = $qp->parse_query($query_string);
print "Parsed query is: " . $query->get_description(). "<br/>";

// Find the top 10 results for the query.
$enquire->set_query($query);
$enquire->set_sort_by_relevance_then_value(1,1);
$matches = $enquire->get_mset(0, 10);


// Display the results.
print $matches->get_matches_estimated()." results found:\n";
echo "<pre>";


$i = $matches->begin();
while (!$i->equals($matches->end())) {
    $n = $i->get_rank() + 1;
        $document = $i->get_document();
        $data = $document->get_data();

  foreach (split("\n", $data) as $line) {
        $nameval = split("=", $line, 2);
                $field[$nameval[0]] = $nameval[1];
    }
print_r($field);
    echo "$n: ". $i->get_percent()." % id=:". $i->get_docid();




__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 



More information about the Xapian-discuss mailing list