[Xapian-tickets] [Xapian] #180: Add support for CJK text to queryparser and termgenerator

Xapian nobody at xapian.org
Mon Aug 22 04:13:17 BST 2011


#180: Add support for CJK text to queryparser and termgenerator
-------------------------+--------------------------------------------------
 Reporter:  richard      |        Owner:  richard  
     Type:  enhancement  |       Status:  assigned 
 Priority:  normal       |    Milestone:  1.3.0    
Component:  QueryParser  |      Version:  SVN trunk
 Severity:  normal       |   Resolution:           
 Keywords:               |    Blockedby:           
 Platform:  All          |     Blocking:           
-------------------------+--------------------------------------------------

Comment(by olly):

 Dai Youli noted on IRC that mixed numbers like 2千3百 (two thousand three
 hundred) get indexed as four separate terms - while that's not terrible
 (since the same does at least happen at search time), it's not ideal
 either - searching for 2千3百 would find 3千2百, as well as documents
 containing those characters nowhere near each other.

 Perhaps digits among CJK characters should be included in the span of text
 to be passed for n-gramming though.

-- 
Ticket URL: <http://trac.xapian.org/ticket/180#comment:28>
Xapian <http://xapian.org/>
Xapian



More information about the Xapian-tickets mailing list