[Xapian-tickets] [Xapian] #180: Add support for CJK text to queryparser and termgenerator
Xapian
nobody at xapian.org
Fri Sep 25 11:37:53 BST 2009
#180: Add support for CJK text to queryparser and termgenerator
-------------------------+--------------------------------------------------
Reporter: richard | Owner: richard
Type: enhancement | Status: assigned
Priority: high | Milestone: 1.2.0
Component: QueryParser | Version: SVN trunk
Severity: normal | Resolution:
Keywords: | Blockedby:
Platform: All | Blocking:
-------------------------+--------------------------------------------------
Comment(by xaka):
Updated patch attached.
1. Where i should put cjkv headers/sources files?
2. Yes, glib2 dependency not good because Xapian already has Unicode/UTF-8
API. I agree, but i have no time while to completely rework cjkv code and
because i've integrate Dijon's code "as is". One thing - Dijon/glib2 code
will be used only if document has CJKV sequences, i.e. 99% backward
compatible for non-CJKV documents :).
3. How and where user should select CJKV-mode? What if user just have a
big folder with many files which updates every day and every day this big
folder is indexing. Or another example - international forums. There is no
way to say "index this file/topic with CJKV-mode". We can try to optimize
scanning and detecting CJKV sequence process.
4. About your alternatively. Its already done in patch (if i'm right
understand you). If indexable string doesn't have CJKV - will be used old
algorithm.
Saying simple - "No CJKV - patch will not be used and all staying as is.
If there CJKV - we will use modified queryparser/termgenerator code".
Lets continue discuss all things and i think i can help to complete
integrate CJKV. Major work is done. Minor remains...
--
Ticket URL: <http://trac.xapian.org/ticket/180#comment:9>
Xapian <http://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list