[Xapian-discuss] Problem getting Xapian working with Burmese
emmanuel at engelhart.org
emmanuel at engelhart.org
Fri Aug 21 13:44:44 BST 2009
Hi
I want to update my request.
Is my question bad formulated? too trivial? ... or maybe pretty complicated/unclear?
In fact I'm not a Xapian nor a search engine expert, so I have no Idea where I have to start my investigation.
Without having the answer to my question, maybe someone can give me Idea how to better understand the issue?
Regards
Emmanuel
Le ven 17/07/09 19:30, "Emmanuel Engelhart" emmanuel at engelhart.org a écrit:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hi,
>
> I use Xapian in my project with multiple latin languages and it works
> good. I have also tried with Parsi, and it looks to work too.
>
> But, with Burmese, this is a little bit different. What I do:
>
> mkdir html
> cd html
> wget -O doc.html http://my.wikipedia.orgcd ..
> omindex --db=./xapdb ./html/
>
> To make a simple search in the db I use the following Perl script (my
> code is in C++ and it does not work too):
>
> ===================================================================
> #!/usr/bin/perl
>
> use Search::Xapian;
> use utf8;
>
> my $db = Search::Xapian::Database->new( './xapdb' );
> my $enq = $db->enquire( $ARGV[0] );
>
> printf "Running query '%s'\n",
> $enq->get_query()->get_description();
> my @matches = $enq->matches(0, 10);
>
> print scalar(@matches) . " results found\n";
>
> foreach my $match ( @matches ) {
> my $doc = $match->get_document();
> printf "ID %d %d%% [ %s ]\n", $match->get_docid(),
> $match->get_percent(), $doc->get_data();
> }
> ===================================================================
>
> ./search.pl problems
>
> ... returns the document, because you have at the beginning of the page
> a sentence in English with this word inside.
>
> ./search.pl ၁၂၆၆
>
> ... return a result too.
>
> ./search ဝီကီပိဒိယအကြောင်း
> ./search ဗဟိုစာမျက်နှာ
>
> ... do not work... in fact it does not work most of the time. I seems
> towork only with Burmese words wich are short and/or only with certain
> characters.
>
> Is that normal?
>
> Regards
> Emmanuel
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.9 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
> iEYEARECAAYFAkpgtUAACgkQn3IpJRpNWtPRRgCfZukUGfG8Eliv6SKZDXoAWnlI
> SP8Animz/5IUtSl9Ba2oV8vJLkjdLcDX
> =QjZX
> -----END PGP SIGNATURE-----
>
> _______________________________________________
> Xapian-discuss mailing list
> Xapian-discuss at lists.xapian.orghttp://lists.xapian.org/mailman/listinfo/xapian-discuss
>
>
More information about the Xapian-discuss
mailing list