[Xapian-discuss] Problem getting Xapian working with Burmese

emmanuel at engelhart.org emmanuel at engelhart.org
Tue Feb 2 09:44:37 GMT 2010


 Le dim 31/01/10 23:53, "Olly Betts" olly at survex.com a écrit:
> There seem to be two issues here.
> 
> The first is with NON_SPACING_MARK characters (which I think is what
> you are referring to above).  In 1.1.x, these are treated as part of
> the word, but this issue was reported when we were at about 1.0.11, so we
> couldn't just change the behaviour of 1.0.x without breaking existing
> databases.  So we went for the less good but compatible approach of
> making QueryParser treat these characters as phrase generators.
> 
> This is the ticket for that issue which has more detail:
> 
> http://trac.xapian.org/ticket/355

Indeed, this seems to be the issue.
I have made a test against the dev. source code and it works better (less cuts in the words).

> The second issue in your case is that there are zero-width space
> characters in there as well, which currently act as word breaks.  These are present
> to indicate acceptable places to split a word when wrapping text, so we
> should ideally just strip them out when generating terms.
 
Ok, so that may explain why they are still cuts in the words (also with the dev. code).

Do I have to open a bugs for that ?
Do they exist plan to fix that ?

Regards
Emmanuel




More information about the Xapian-discuss mailing list