Possible bug using FLAG_WORD_BREAKS with fullwidth Unicode codepoints
Olly Betts
olly at survex.com
Thu Jan 18 03:46:12 GMT 2024
On Wed, Jan 10, 2024 at 09:02:03AM +0100, Robert Stepanek wrote:
> On Tue, Jan 9, 2024, at 3:28 AM, Olly Betts wrote:
> > Did you already check the other ranges for cased letters? I can but if
> > you have already there's not much point.
>
> I did not. If you find time, that'd be great. Otherwise I can make
> room for it in the next days.
I hacked up a quick Perl script and no character in any of the ranges
changes if I apply Perl's lc or uc function (but before your fix it
reports problems).
I said I was leaning towards backporting this to 1.4.x, but having looked
into it I think that's not a good idea as it would result in some
queries no longer matching unless the affected documents were reindexed.
Cheers,
Olly
More information about the Xapian-devel
mailing list