[Xapian-tickets] [Xapian] #809: indexing CJK documents seems quite off
Xapian
nobody at xapian.org
Wed Apr 14 23:03:14 BST 2021
#809: indexing CJK documents seems quite off
----------------------------+------------------------
Reporter: jay sun | Owner: Olly Betts
Type: defect | Status: new
Priority: normal | Milestone:
Component: Other | Version: 1.4.18
Severity: normal | Keywords:
Blocked By: | Blocking:
Operating System: Linux |
----------------------------+------------------------
Hi
I have a directory of docx files, and all contents are in Chinese.
I run recoll to search for a particular string, with only 5 matches. I
made sure ckjoff=0 and cjkgramlen=3
I run docfetcher to search for the same string in the same set of Chinese
documents, and found 60+ matches.
Therefore there must be some issue with recoll/xapian indexing of those
documents.
Thanks
Jay
--
Ticket URL: <https://trac.xapian.org/ticket/809>
Xapian <https://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list