[Xapian-discuss] Problem getting Xapian working with Burmese
emmanuel at engelhart.org
emmanuel at engelhart.org
Thu Jan 28 10:50:12 GMT 2010
On Fri, Aug 21, 2009 at 02:44:44PM +0200, emmanuel at engelhart.org wrote:
>> I want to update my request.
>> Is my question bad formulated? too trivial? ... or maybe pretty
>> complicated/unclear?
>
>I think nobody answered as it was hard to follow your example because
>the Burmese characters seem to have been mangled (at least the message I
>received wasn't valid utf-8).
>
>But looking at the code, I see an issue:
>
>> my $db = Search::Xapian::Database->new( './xapdb' );
>> my $enq = $db->enquire( $ARGV[0] );
>
>What this does is to create an Enquire object and set Query($ARGV[0]) as
>the query. That works OK if $ARGV[0] is a single word which gets
>indexed as a single term, but you really want to parse the query string
>to get a Query object:
>
> my $db = Search::Xapian::Database->new( './xapdb' );
> my $queryparser = Search::Xapian::QueryParser->new();
> my $query = $queryparser->parse_query( $ARGV[0] );
> my $enq = $db->enquire( $query );
>
>I'd guess that is probably your problem, but I can't tell for sure as I
>can't test your examples...
>
>For further information on debugging this sort of problem, see:
>
>http://trac.xapian.org/wiki/FAQ/NoMatches
>
Hi Olly,
thank vor your answer (and sorry not having answered before).
Your answer helped me and I think I now understand why "it does not work".
For test purpose I index one document with one string with index_text_without_positions() (C++ API) the string "ဝီကီပိသုံးစွဲသူများက"
See this log: http://tmp.kiwix.org/tmp/kiwix-index.log (utf8 encoded)
But if I run "delve -r 1 /path/to/db" on the index I get following answer:
Term List for record #1: test က စ ပ မ ဝ သ (utf8 encoded)
See the log : http://tmp.kiwix.org/tmp/delve.log
So, it seems to be clear for me why "it does not work" : my word is splitted in single lletters and a lot of letters are removed.
Do I'm right? Do we can avoid that and index "ဝီကီပိသုံးစွဲသူများက" as only one word?
Regards
Emmanuel
More information about the Xapian-discuss
mailing list