[Xapian-discuss] Integrated Chinese tokenizer SCWS in xapian-core

hightman hightman at zuaa.zju.edu.cn
Wed Sep 14 06:40:25 BST 2011


Xapian is a very excellent open source search engine library,  but there is no native support for Chinese word segmentation in queryparser and termgenerator.

Therefore, I modified small amount of source codes, integrated into the SCWS tokenizer, that is the same open-source and developped by myself. 

Anyone can obtain the patch from below URL. After patching, Xapian::QueryParser::parse_query and Xapian::Termgenerator::index_text will support chinese words segmentation directly.

https://github.com/hightman/xunsearch/blob/master/xapian-scws/patch.xapian-core-scws

Hope that is useful to Chinese users of xapian.

---------- 

The following messages is about xunsearch, that was developped upon xapian-cores and scws.
Included two back-end servers written in C/C++, and front-end developement library written in PHP. It provide a more easy to use search engine solution for chinese user.

迅搜(xunsearch)是采用 C/C++ 基于 xapian 和 scws 开发的全文搜索引擎解决方案,提供 PHP 语言的开发接口。

旨在帮助一般开发者针对既有的海量数据,快速而方便地建立自己的全文搜索引擎。全文检索可以帮助您降低服务器搜索负荷、极大程度的提高搜索速度和用户体验。

支持海量数据高速检索,功能强大,简单易用,而且开源免费!代码已经全部托管在 github 上。

目前还是测试版本,该版本主要用于测试目的,可能还会存在一些 bug 或问题,不要用于生产环境。

下载地址:http://www.xunsearch.com/download/
文档地址:http://www.xunsearch.com/doc/
GIT代码仓库:http://github.com/hightman/xunsearch/





More information about the Xapian-discuss mailing list