[Xapian-tickets] [Xapian] #180: Add support for CJK text to queryparser and termgenerator
Xapian
nobody at xapian.org
Fri Sep 25 09:50:05 BST 2009
#180: Add support for CJK text to queryparser and termgenerator
-------------------------+--------------------------------------------------
Reporter: richard | Owner: richard
Type: enhancement | Status: assigned
Priority: high | Milestone: 1.2.0
Component: QueryParser | Version: SVN trunk
Severity: normal | Resolution:
Keywords: | Blockedby:
Platform: All | Blocking:
-------------------------+--------------------------------------------------
Changes (by xaka):
* cc: xaka2004@… (added)
Comment:
Hi everyone!
About month ago for company where i'm working was neccessary to add CJKV
indexer to improve search mechanism. As backend we use Xapian and omega
indexer.
I attached result of my work by integrating '''Dijon CJKVTokenizer''' into
latest stable Xapian source tree (1.0.16). All tests passed, tokenizer
works really great.
What i'm done:
* added '''m4/pkg.m4''' file to use pkg-config features to determine right
CFLAGS and LIBS
* with my patch Xapian depend on glib2 which uses in CJKV tokenizer to
work with
unicode/utf-8
* added checking for glib2 at configure time
* expand LIBS and CFLAGS of xapian-config by glib2
* added '''include/xapian/cjkv/CJKVTokenizer.h''' from Dijon (i leave
Dijon namespace) with any touches
* added '''queryparser/CJKVTokenizer.cc''' from Dijon without any touches
* added modified QueryModifier which uses to modify input query (bigram
model to split CJKV sequence to tokens, no changes for another parts of
query). Its modifier uses at parser_query call time
* added modified Indexer which uses in TermGenerator (bigram model to
split CJKV sequence into terms)
To build Xapian you need:
* '''call "aclocal"''' to regenerate aclocal.m4 and include added pkg.m4
* '''call "autoconf"'''
* '''call "automake"'''
* '''be sure that you have install glib2'''
* '''call "make"'''
I've modified 2 parts of Xapian: '''QueryParser::Internal::parse_query'''
and '''TermGenerator::Interanl::index_text'''. As result you need just
rebuild xapian-core and xapian-omega and i'll get CJKV.
--
Ticket URL: <http://trac.xapian.org/ticket/180#comment:7>
Xapian <http://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list