[Xapian-discuss] Integrated Chinese tokenizer SCWS in xapian-core
hightman
hightman at zuaa.zju.edu.cn
Wed Sep 14 06:40:25 BST 2011
Xapian is a very excellent open source search engine library, but there is no native support for Chinese word segmentation in queryparser and termgenerator.
Therefore, I modified small amount of source codes, integrated into the SCWS tokenizer, that is the same open-source and developped by myself.
Anyone can obtain the patch from below URL. After patching, Xapian::QueryParser::parse_query and Xapian::Termgenerator::index_text will support chinese words segmentation directly.
https://github.com/hightman/xunsearch/blob/master/xapian-scws/patch.xapian-core-scws
Hope that is useful to Chinese users of xapian.
----------
The following messages is about xunsearch, that was developped upon xapian-cores and scws.
Included two back-end servers written in C/C++, and front-end developement library written in PHP. It provide a more easy to use search engine solution for chinese user.
迅搜(xunsearch)是采用 C/C++ 基于 xapian 和 scws 开发的全文搜索引擎解决方案,提供 PHP 语言的开发接口。
旨在帮助一般开发者针对既有的海量数据,快速而方便地建立自己的全文搜索引擎。全文检索可以帮助您降低服务器搜索负荷、极大程度的提高搜索速度和用户体验。
支持海量数据高速检索,功能强大,简单易用,而且开源免费!代码已经全部托管在 github 上。
目前还是测试版本,该版本主要用于测试目的,可能还会存在一些 bug 或问题,不要用于生产环境。
下载地址:http://www.xunsearch.com/download/
文档地址:http://www.xunsearch.com/doc/
GIT代码仓库:http://github.com/hightman/xunsearch/
More information about the Xapian-discuss
mailing list