[Xapian-tickets] [Xapian] #180: Add support for CJK text to queryparser and termgenerator
Xapian
nobody at xapian.org
Thu Aug 18 07:06:36 BST 2011
#180: Add support for CJK text to queryparser and termgenerator
-------------------------+--------------------------------------------------
Reporter: richard | Owner: richard
Type: enhancement | Status: assigned
Priority: normal | Milestone: 1.3.0
Component: QueryParser | Version: SVN trunk
Severity: normal | Resolution:
Keywords: | Blockedby:
Platform: All | Blocking:
-------------------------+--------------------------------------------------
Comment(by olly):
It looks to me like CJK::tokenizer::tokenize() will just ignore non-CJK
characters, and for example produce a bi-gram from two CJK characters with
arbitrary non-CJK between them.
But looking at where we call this method, it seems we only ever pass it
all-CJK input.
So I propose we scrap the CJK test in there and just n-gram away on
whatever the input is.
Thoughts?
--
Ticket URL: <http://trac.xapian.org/ticket/180#comment:25>
Xapian <http://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list