[Xapian-tickets] [Xapian] #180: Add support for CJK text to queryparser and termgenerator

Xapian nobody at xapian.org
Thu Aug 18 07:33:17 BST 2011


#180: Add support for CJK text to queryparser and termgenerator
-------------------------+--------------------------------------------------
 Reporter:  richard      |        Owner:  richard  
     Type:  enhancement  |       Status:  assigned 
 Priority:  normal       |    Milestone:  1.3.0    
Component:  QueryParser  |      Version:  SVN trunk
 Severity:  normal       |   Resolution:           
 Keywords:               |    Blockedby:           
 Platform:  All          |     Blocking:           
-------------------------+--------------------------------------------------

Comment(by bschaefer):

 > I've committed the latest patch on a branch in git, cleaned up a few
 things, and fixed a bug with dereferencing an iterator before the end
 check:

 Thank you for cleaning it up and finding that bug.

 > I noticed an issue with the term positions - currently the code blindly
 assigns a different position to every n-gram it generates, which doesn't
 seem a good approach.
 > I'm not sure what the best approach is though.  The key thing is we want
 phrases and the NEAR and ADJ operators to work in a natural way for users.


 As for the term positions that get assigned I was also unsure of the best
 option. I do agree that phrases, NEAR and ADJ operators should work with
 CJK chars and will be the next thing to integrate in. I think a better
 solution will come up when working on the phrases, NEAR and ADJ.


 > But looking at where we call this method, it seems we only ever pass it
 all-CJK input.
 > So I propose we scrap the CJK test in there and just n-gram away on
 whatever the input is.

 I saw that also, but left it in there 'just in case' but I agree fully
 that it is unnecessary since it only handles pure cjk.

-- 
Ticket URL: <http://trac.xapian.org/ticket/180#comment:26>
Xapian <http://xapian.org/>
Xapian



More information about the Xapian-tickets mailing list