[Xapian-devel] patch - Some CJK codepoints are also punctuation

Greg Banks gnb at fastmail.fm
Wed Mar 20 05:23:42 GMT 2013



On Sat, Mar 16, 2013, at 11:43 AM, Olly Betts wrote:
> This seems a sensible change, but it really needs some test coverage.

Here's a new version with a test case added.

This test case exposed a pre-existing bug, which I also fixed in the
patch.  Imagine that A..F are CJK characters; the text

ABC DEF

should generate the following terms

A
AB
B
BC
C
D
DE  <---- this one is not generated
E
EF
F


-- 
Greg.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: some-cjk-codepoints-are-punctuation-v2.patch
Type: text/x-patch
Size: 3321 bytes
Desc: not available
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20130320/23e4852a/attachment.bin>


More information about the Xapian-devel mailing list