[Xapian-tickets] [Xapian] #180: Add support for CJK text to queryparser and termgenerator

Xapian nobody at xapian.org
Sat Apr 2 07:10:22 BST 2016


#180: Add support for CJK text to queryparser and termgenerator
-------------------------+------------------------------
 Reporter:  richard      |             Owner:  richard
     Type:  enhancement  |            Status:  closed
 Priority:  normal       |         Milestone:  1.2.22
Component:  QueryParser  |           Version:  SVN trunk
 Severity:  normal       |        Resolution:  fixed
 Keywords:               |        Blocked By:
 Blocking:               |  Operating System:  All
-------------------------+------------------------------
Description changed by mariakatosvich:

Old description:

> Some code to do this kind of tokenisation is now available at
> http://code.google.com/p/cjk-tokenizer/ which should probably be used as
> the
> basis for supporting this in Xapian.
>
> We could add this as a !QueryParser/!TermGenerator option without
> breaking API compatibility.  Marking for considering later in 1.1.x, but
> it could probably go in 1.2.x as it's likely to be ABI compatible too.

New description:

 Some code to do this kind of tokenisation is now available at
 http://code.google.com/p/cjk-tokenizer/ which should probably be used as
 the
 basis for supporting this in Xapian.

 We could add this as a !QueryParser/!TermGenerator option without breaking
 API compatibility.  Marking for considering later in 1.1.x, but it could
 probably go in 1.2.x as it's likely to be ABI compatible too.I think it's
 probably better to have the user select "CJKV-mode". Exploding every
 string being indexed into a vector and then scanning it to see if CJKV
 characters are present is going to add a lot of overhead to everyone, even
 those indexing non-CJKV text.  http://www.qwiknumbers.com/sky-customer-
 services

--

--
Ticket URL: <https://trac.xapian.org/ticket/180#comment:35>
Xapian <//xapian.org/>
Xapian



More information about the Xapian-tickets mailing list