[Xapian-tickets] [Xapian] #809: indexing CJK documents seems quite off

Xapian nobody at xapian.org
Wed Apr 14 23:03:14 BST 2021


#809: indexing CJK documents seems quite off
----------------------------+------------------------
        Reporter:  jay sun  |      Owner:  Olly Betts
            Type:  defect   |     Status:  new
        Priority:  normal   |  Milestone:
       Component:  Other    |    Version:  1.4.18
        Severity:  normal   |   Keywords:
      Blocked By:           |   Blocking:
Operating System:  Linux    |
----------------------------+------------------------
 Hi

 I have a directory of docx files, and all contents are in Chinese.

 I run recoll to search for a particular string, with only 5 matches. I
 made sure ckjoff=0 and cjkgramlen=3

 I run docfetcher to search for the same string in the same set of Chinese
 documents, and found 60+ matches.

 Therefore there must be some issue with recoll/xapian indexing of those
 documents.

 Thanks
 Jay
-- 
Ticket URL: <https://trac.xapian.org/ticket/809>
Xapian <https://xapian.org/>
Xapian


More information about the Xapian-tickets mailing list