[Xapian-discuss] Chinese, Japanese, Korean Tokenizer.

头太晕 torrycn at gmail.com
Thu Jun 14 02:53:57 BST 2007


2007/6/14, Olly Betts <olly at survex.com>:
>
> On Wed, Jun 06, 2007 at 09:58:19AM +0800, ?????? wrote:
> > When the Chinese word is utf-8 encode.  QueryParser.parse_query()  have
> > problem. It can not output the right Chinese word.
>
> Can you provide an example query string I can reproduce this with?
>
> You'll probably need to tell me what you expect the output to be too.
>
> Cheers,
>     Olly
>

Example:
There are two term in a database,  "沙发","沙子","你好"

the query string is "沙*".

the code by python:

import xapian
db = xapian.Database('~/dbtest')
queryparser = xapian.QueryParser()
queryparser.set_database(db)
querystring = u'沙*'.encode('utf-8')
query = queryparser.parse_query(querystring,xapian.QueryParser.FLAG_WILDCARD
)
print query.get_description()

I want the result is :
Xapian::Query((沙发 OR 沙子))


Regards


More information about the Xapian-discuss mailing list