[Xapian-tickets] [Xapian] #180: Add support for CJK text to queryparser and termgenerator

Xapian nobody at xapian.org
Thu Apr 9 01:35:59 BST 2009


#180: Add support for CJK text to queryparser and termgenerator
-------------------------+--------------------------------------------------
 Reporter:  richard      |        Owner:  richard  
     Type:  enhancement  |       Status:  assigned 
 Priority:  normal       |    Milestone:  1.1.7    
Component:  QueryParser  |      Version:  SVN trunk
 Severity:  normal       |   Resolution:           
 Keywords:               |    Blockedby:           
 Platform:  All          |     Blocking:           
-------------------------+--------------------------------------------------
Changes (by olly):

  * type:  defect => enhancement
  * milestone:  => 1.1.7


Old description:

> Some code to do this kind of tokenisation is now available at
> http://code.google.com/p/cjk-tokenizer/ which should probably be used as
> the
> basis for supporting this in Xapian.

New description:

 Some code to do this kind of tokenisation is now available at
 http://code.google.com/p/cjk-tokenizer/ which should probably be used as
 the
 basis for supporting this in Xapian.

 We could add this as a QueryParser/TermGenerator option without breaking
 API compatibility.  Marking for considering later in 1.1.x, but it could
 probably go in 1.2.x as it's likely to be ABI compatible too.

--

Comment:

 Fabrice Colin said on xapian-discuss:

 Pinot uses a slightly modified version of Yung-Chung Lin's
 cjk-tokenizer that can be found at
 http://svn.berlios.de/wsvn/dijon/trunk/cjkv/CJKVTokenizer.cc

 For an example, see the XapianIndex and TokensIndexer classes at
 http://svn.berlios.de/wsvn/pinot/trunk/IndexSearch/Xapian/XapianIndex.cpp

-- 
Ticket URL: <http://trac.xapian.org/ticket/180#comment:3>
Xapian <http://xapian.org/>
Xapian



More information about the Xapian-tickets mailing list