[Xapian-discuss] xapian supports Chinese language

Olly Betts olly at survex.com
Wed Apr 8 12:36:11 BST 2009


On Wed, Apr 08, 2009 at 05:08:31PM +0800, Li Yong wrote:
> I want to use xapian to index chinese html pages.
> 
> I found the cjk-tokenizer lib in the maillist
> http://lists.tartarus.org/pipermail/xapian-discuss/2007-June/003921.html
> 
> However, I do not know how to add this lib to the xapian project.

That's just a link to the one in Lucene.

This one might be more useful:

http://thread.gmane.org/gmane.comp.search.xapian.general/4574/focus=4762

> Is there any example or steps?

I've not tried to use it myself.

The longer term plan is to include this or something similar in Xapian
itself, but nobody is currently working on it as far as I know.

For now, I think you'd have to just ignore Xapian::TermGenerator and
Xapian::QueryParser and add the bigram terms with add_posting() when
indexing and combine them into queries with OP_AND.

Cheers,
    Olly



More information about the Xapian-discuss mailing list