[Xapian-tickets] [Xapian] #180: Add support for CJK text to queryparser and termgenerator

Xapian nobody at xapian.org
Thu Aug 18 07:06:36 BST 2011


#180: Add support for CJK text to queryparser and termgenerator
-------------------------+--------------------------------------------------
 Reporter:  richard      |        Owner:  richard  
     Type:  enhancement  |       Status:  assigned 
 Priority:  normal       |    Milestone:  1.3.0    
Component:  QueryParser  |      Version:  SVN trunk
 Severity:  normal       |   Resolution:           
 Keywords:               |    Blockedby:           
 Platform:  All          |     Blocking:           
-------------------------+--------------------------------------------------

Comment(by olly):

 It looks to me like CJK::tokenizer::tokenize() will just ignore non-CJK
 characters, and for example produce a bi-gram from two CJK characters with
 arbitrary non-CJK between them.

 But looking at where we call this method, it seems we only ever pass it
 all-CJK input.

 So I propose we scrap the CJK test in there and just n-gram away on
 whatever the input is.

 Thoughts?

-- 
Ticket URL: <http://trac.xapian.org/ticket/180#comment:25>
Xapian <http://xapian.org/>
Xapian



More information about the Xapian-tickets mailing list