[Xapian-tickets] [Xapian] #180: Add support for CJK text to queryparser and termgenerator
Xapian
nobody at xapian.org
Mon Jul 25 20:59:27 BST 2011
#180: Add support for CJK text to queryparser and termgenerator
-------------------------+--------------------------------------------------
Reporter: richard | Owner: richard
Type: enhancement | Status: assigned
Priority: normal | Milestone: 1.2.x
Component: QueryParser | Version: SVN trunk
Severity: normal | Resolution:
Keywords: | Blockedby:
Platform: All | Blocking:
-------------------------+--------------------------------------------------
Comment(by bschaefer):
Updated the patch to everything you wanted changed. The parser was a lot
better of place to put it, I only added one rule to the CFG and it was:
"compound_term(T) ::= CJKTERM(U)." then made a function under Term to
produce the n-grams. I can also see where to add some more rules to get a
better integration of CJK but for now this is a good start to see if this
is the direction you want.
If there are other things you would like let me change let me know. I also
added to the regression test for the term generator and query parser,
though there are a few things that would still needed to be added to the
grammar to allow for things such as NEAR to be used with CJK. Though this
is a big step forward from no CJK allowed at all. It would also be helpful
if you knew any one who was fluent in any of the CJK languages for better
testing of semantics of the languages and more or less special cases and
symbols. Then again the real solution to this is proper segmentation.
--
Ticket URL: <http://trac.xapian.org/ticket/180#comment:20>
Xapian <http://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list