[Xapian-discuss] Problem getting Xapian working with Burmese

emmanuel at engelhart.org emmanuel at engelhart.org
Thu Jan 28 10:50:12 GMT 2010


 On Fri, Aug 21, 2009 at 02:44:44PM +0200, emmanuel at engelhart.org wrote:
>> I want to update my request.
>> Is my question bad formulated? too trivial? ... or maybe pretty
>> complicated/unclear?
>
>I think nobody answered as it was hard to follow your example because
>the Burmese characters seem to have been mangled (at least the message I
>received wasn't valid utf-8).
>
>But looking at the code, I see an issue:
>
>> my $db = Search::Xapian::Database->new( './xapdb' );
>> my $enq = $db->enquire( $ARGV[0] );
>
>What this does is to create an Enquire object and set Query($ARGV[0]) as
>the query.  That works OK if $ARGV[0] is a single word which gets
>indexed as a single term, but you really want to parse the query string
>to get a Query object:
>
>    my $db = Search::Xapian::Database->new( './xapdb' );
>    my $queryparser = Search::Xapian::QueryParser->new();
>    my $query = $queryparser->parse_query( $ARGV[0] );
>    my $enq = $db->enquire( $query );
>
>I'd guess that is probably your problem, but I can't tell for sure as I
>can't test your examples...
>
>For further information on debugging this sort of problem, see:
>
>http://trac.xapian.org/wiki/FAQ/NoMatches
>

Hi Olly,

thank vor your answer (and sorry not having answered before).

Your answer helped me and I think I now understand why "it does not work".

For test purpose I index one document with one string  with index_text_without_positions() (C++ API) the string "ဝီ​ကီ​ပိ​သုံး​စွဲ​သူ​များက"
See this log: http://tmp.kiwix.org/tmp/kiwix-index.log (utf8 encoded)

But if I run "delve -r 1 /path/to/db" on the index I get following answer:
Term List for record #1: test က စ ပ မ ဝ သ  (utf8 encoded)
See the log : http://tmp.kiwix.org/tmp/delve.log

So, it seems to be clear for me why "it does not work" : my word is splitted in single lletters and a lot of letters are removed.

Do I'm right? Do we can avoid that and index "ဝီ​ကီ​ပိ​သုံး​စွဲ​သူ​များက" as only one word?

Regards
Emmanuel






More information about the Xapian-discuss mailing list