[Xapian-discuss] Problem getting Xapian working with Burmese

Emmanuel Engelhart emmanuel at engelhart.org
Thu Feb 11 20:16:00 GMT 2010


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Olly Betts a écrit :
> On Tue, Feb 02, 2010 at 10:44:37AM +0100, emmanuel at engelhart.org wrote:
>> Le dim 31/01/10 23:53, "Olly Betts" olly at survex.com a écrit:
>>> http://trac.xapian.org/ticket/355
>> Indeed, this seems to be the issue.
>> I have made a test against the dev. source code and it works better (less
>> cuts in the words).
> 
> Good, thanks for checking.
> 
>>> The second issue in your case is that there are zero-width space
>>> characters in there as well, which currently act as word breaks.  These are
>>> present to indicate acceptable places to split a word when wrapping text,
>>> so we should ideally just strip them out when generating terms.
>> Ok, so that may explain why they are still cuts in the words (also with the
>> dev. code).
>>
>> Do I have to open a bugs for that ?
>> Do they exist plan to fix that ?
> 
> I didn't bother opening a ticket, as it's a quick change.  It's now addressed
> on trunk by r13921.  If you could test that and see if it works better for you,
> that would be great.

Looks good, but trunk seems to not compile by me:

libtool: compile:  g++ -DHAVE_CONFIG_H -I. -I./common -I./include -Wall
- -W -Wredundant-decls -Wpointer-arith -Wcast-qual -Wcast-align
- -Wno-long-long -Wformat-security -fno-gnu-keywords -Wundef -Wshadow
- -Woverloaded-virtual -Wstrict-null-sentinel -Wshadow -Wstrict-overflow=1
- -Winit-self -Wlogical-op -Wmissing-declarations -fvisibility=hidden -g
- -O2 -MT common/serialise-double.lo -MD -MP -MF
common/.deps/serialise-double.Tpo -c common/serialise-double.cc  -fPIC
- -DPIC -o common/.libs/serialise-double.o
common/serialise-double.cc: In function ‘double unserialise_double(const
char**, const char*)’:
common/serialise-double.cc:141: error: ‘SerialisationError’ is not a
member of ‘Xapian’
common/serialise-double.cc:157: error: ‘SerialisationError’ is not a
member of ‘Xapian’
common/serialise-double.cc:169: error: ‘SerialisationError’ is not a
member of ‘Xapian’
make[2]: *** [common/serialise-double.lo] Erreur 1


> For 1.0.x, I think all we can do is to make these characters phrase generators,
> or else we introduce an incompatibility with existing databases.

For me this is not critical... as workaround I will use a tokenizer
coded be my own.

Regards
Emmanuel
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkt0ZX4ACgkQn3IpJRpNWtOHnACgpdUci1H1IW6cpvPsLzyPMu/Y
hAsAoK2HYjPX8zlw8B6viG2WXBHlqSPS
=HhhO
-----END PGP SIGNATURE-----



More information about the Xapian-discuss mailing list