[Xapian-tickets] [Xapian] #180: Add support for CJK text to queryparser and termgenerator
Xapian
nobody at xapian.org
Sat Apr 2 07:10:22 BST 2016
#180: Add support for CJK text to queryparser and termgenerator
-------------------------+------------------------------
Reporter: richard | Owner: richard
Type: enhancement | Status: closed
Priority: normal | Milestone: 1.2.22
Component: QueryParser | Version: SVN trunk
Severity: normal | Resolution: fixed
Keywords: | Blocked By:
Blocking: | Operating System: All
-------------------------+------------------------------
Description changed by mariakatosvich:
Old description:
> Some code to do this kind of tokenisation is now available at
> http://code.google.com/p/cjk-tokenizer/ which should probably be used as
> the
> basis for supporting this in Xapian.
>
> We could add this as a !QueryParser/!TermGenerator option without
> breaking API compatibility. Marking for considering later in 1.1.x, but
> it could probably go in 1.2.x as it's likely to be ABI compatible too.
New description:
Some code to do this kind of tokenisation is now available at
http://code.google.com/p/cjk-tokenizer/ which should probably be used as
the
basis for supporting this in Xapian.
We could add this as a !QueryParser/!TermGenerator option without breaking
API compatibility. Marking for considering later in 1.1.x, but it could
probably go in 1.2.x as it's likely to be ABI compatible too.I think it's
probably better to have the user select "CJKV-mode". Exploding every
string being indexed into a vector and then scanning it to see if CJKV
characters are present is going to add a lot of overhead to everyone, even
those indexing non-CJKV text. http://www.qwiknumbers.com/sky-customer-
services
--
--
Ticket URL: <https://trac.xapian.org/ticket/180#comment:35>
Xapian <//xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list