[Xapian-tickets] [Xapian] #355: non-spacing chars are not term splitters
Xapian
nobody at xapian.org
Mon Mar 30 16:31:00 BST 2009
#355: non-spacing chars are not term splitters
--------------------+-------------------------------------------------------
Reporter: alsadi | Owner: olly
Type: defect | Status: new
Priority: normal | Milestone:
Component: Other | Version:
Severity: normal | Blockedby:
Platform: All | Blocking:
--------------------+-------------------------------------------------------
I was evaluating the use of xapian to index Arabic documents
and I noticed that terms are chopped off
the reason is that chars like U+0651 ARABIC SHADDA (stress marker)
which is in Unicode category as "Mark, Non-Spacing"
are not marked by is_wordchar to be part of the word and thus the word
would be split
the patch is trivial
thanks to Olly Betts (IRC:ojwb) for helping me on it
--
Ticket URL: <http://trac.xapian.org/ticket/355>
Xapian <http://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list