[Xapian-discuss] Problem getting Xapian working with Burmese
Emmanuel Engelhart
emmanuel at engelhart.org
Fri Jul 17 18:30:43 BST 2009
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi,
I use Xapian in my project with multiple latin languages and it works
good. I have also tried with Parsi, and it looks to work too.
But, with Burmese, this is a little bit different. What I do:
mkdir html
cd html
wget -O doc.html http://my.wikipedia.org
cd ..
omindex --db=./xapdb ./html/
To make a simple search in the db I use the following Perl script (my
code is in C++ and it does not work too):
===================================================================
#!/usr/bin/perl
use Search::Xapian;
use utf8;
my $db = Search::Xapian::Database->new( './xapdb' );
my $enq = $db->enquire( $ARGV[0] );
printf "Running query '%s'\n", $enq->get_query()->get_description();
my @matches = $enq->matches(0, 10);
print scalar(@matches) . " results found\n";
foreach my $match ( @matches ) {
my $doc = $match->get_document();
printf "ID %d %d%% [ %s ]\n", $match->get_docid(),
$match->get_percent(), $doc->get_data();
}
===================================================================
./search.pl problems
... returns the document, because you have at the beginning of the page
a sentence in English with this word inside.
./search.pl ၁၂၆၆
... return a result too.
./search ဝီကီပိဒိယအကြောင်း
./search ဗဟိုစာမျက်နှာ
... do not work... in fact it does not work most of the time. I seems to
work only with Burmese words wich are short and/or only with certain
characters.
Is that normal?
Regards
Emmanuel
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkpgtUAACgkQn3IpJRpNWtPRRgCfZukUGfG8Eliv6SKZDXoAWnlI
SP8Animz/5IUtSl9Ba2oV8vJLkjdLcDX
=QjZX
-----END PGP SIGNATURE-----
More information about the Xapian-discuss
mailing list