[Xapian-tickets] [Xapian] #180: Add support for CJK text to queryparser and termgenerator

Xapian nobody at xapian.org
Fri Sep 25 09:50:05 BST 2009


#180: Add support for CJK text to queryparser and termgenerator
-------------------------+--------------------------------------------------
 Reporter:  richard      |        Owner:  richard  
     Type:  enhancement  |       Status:  assigned 
 Priority:  high         |    Milestone:  1.2.0    
Component:  QueryParser  |      Version:  SVN trunk
 Severity:  normal       |   Resolution:           
 Keywords:               |    Blockedby:           
 Platform:  All          |     Blocking:           
-------------------------+--------------------------------------------------
Changes (by xaka):

 * cc: xaka2004@… (added)


Comment:

 Hi everyone!
 About month ago for company where i'm working was neccessary to add CJKV
 indexer to improve search mechanism. As backend we use Xapian and omega
 indexer.

 I attached result of my work by integrating '''Dijon CJKVTokenizer''' into
 latest stable Xapian source tree (1.0.16). All tests passed, tokenizer
 works really great.

 What i'm done:

 * added '''m4/pkg.m4''' file to use pkg-config features to determine right
 CFLAGS and LIBS

 * with my patch Xapian depend on glib2 which uses in CJKV tokenizer to
 work with
 unicode/utf-8

 * added checking for glib2 at configure time

 * expand LIBS and CFLAGS of xapian-config by glib2

 * added '''include/xapian/cjkv/CJKVTokenizer.h''' from Dijon (i leave
 Dijon namespace) with any touches

 * added '''queryparser/CJKVTokenizer.cc''' from Dijon without any touches

 * added modified QueryModifier which uses to modify input query (bigram
 model to split CJKV sequence to tokens, no changes for another parts of
 query). Its modifier uses at parser_query call time

 * added modified Indexer which uses in TermGenerator (bigram model to
 split CJKV sequence into terms)

 To build Xapian you need:

 * '''call "aclocal"''' to regenerate aclocal.m4 and include added pkg.m4

 * '''call "autoconf"'''

 * '''call "automake"'''

 * '''be sure that you have install glib2'''

 * '''call "make"'''

 I've modified 2 parts of Xapian: '''QueryParser::Internal::parse_query'''
 and '''TermGenerator::Interanl::index_text'''. As result you need just
 rebuild xapian-core and xapian-omega and i'll get CJKV.

-- 
Ticket URL: <http://trac.xapian.org/ticket/180#comment:7>
Xapian <http://xapian.org/>
Xapian



More information about the Xapian-tickets mailing list