[Xapian-tickets] [Xapian] #180: Add support for CJK text to queryparser and termgenerator

Xapian nobody at xapian.org
Mon Jul 25 20:59:27 BST 2011


#180: Add support for CJK text to queryparser and termgenerator
-------------------------+--------------------------------------------------
 Reporter:  richard      |        Owner:  richard  
     Type:  enhancement  |       Status:  assigned 
 Priority:  normal       |    Milestone:  1.2.x    
Component:  QueryParser  |      Version:  SVN trunk
 Severity:  normal       |   Resolution:           
 Keywords:               |    Blockedby:           
 Platform:  All          |     Blocking:           
-------------------------+--------------------------------------------------

Comment(by bschaefer):

 Updated the patch to everything you wanted changed. The parser was a lot
 better of place to put it, I only added one rule to the CFG and it was:
 "compound_term(T) ::= CJKTERM(U)." then made a function under Term to
 produce the n-grams. I can also see where to add some more rules to get a
 better integration of CJK but for now this is a good start to see if this
 is the direction you want.

 If there are other things you would like let me change let me know. I also
 added to the regression test for the term generator and query parser,
 though there are a few things that would still needed to be added to the
 grammar to allow for things such as NEAR to be used with CJK. Though this
 is a big step forward from no CJK allowed at all. It would also be helpful
 if you knew any one who was fluent in any of the CJK languages for better
 testing of semantics of the languages and more or less special cases and
 symbols. Then again the real solution to this is proper segmentation.

-- 
Ticket URL: <http://trac.xapian.org/ticket/180#comment:20>
Xapian <http://xapian.org/>
Xapian



More information about the Xapian-tickets mailing list