[Xapian-tickets] [Xapian] #180: Add support for CJK text to queryparser and termgenerator
Xapian
nobody at xapian.org
Thu Aug 18 07:33:17 BST 2011
#180: Add support for CJK text to queryparser and termgenerator
-------------------------+--------------------------------------------------
Reporter: richard | Owner: richard
Type: enhancement | Status: assigned
Priority: normal | Milestone: 1.3.0
Component: QueryParser | Version: SVN trunk
Severity: normal | Resolution:
Keywords: | Blockedby:
Platform: All | Blocking:
-------------------------+--------------------------------------------------
Comment(by bschaefer):
> I've committed the latest patch on a branch in git, cleaned up a few
things, and fixed a bug with dereferencing an iterator before the end
check:
Thank you for cleaning it up and finding that bug.
> I noticed an issue with the term positions - currently the code blindly
assigns a different position to every n-gram it generates, which doesn't
seem a good approach.
> I'm not sure what the best approach is though. The key thing is we want
phrases and the NEAR and ADJ operators to work in a natural way for users.
As for the term positions that get assigned I was also unsure of the best
option. I do agree that phrases, NEAR and ADJ operators should work with
CJK chars and will be the next thing to integrate in. I think a better
solution will come up when working on the phrases, NEAR and ADJ.
> But looking at where we call this method, it seems we only ever pass it
all-CJK input.
> So I propose we scrap the CJK test in there and just n-gram away on
whatever the input is.
I saw that also, but left it in there 'just in case' but I agree fully
that it is unnecessary since it only handles pure cjk.
--
Ticket URL: <http://trac.xapian.org/ticket/180#comment:26>
Xapian <http://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list