[Xapian-tickets] [Xapian] #355: non-spacing chars are not term splitters

Xapian nobody at xapian.org
Mon Mar 30 16:31:00 BST 2009


#355: non-spacing chars are not term splitters
--------------------+-------------------------------------------------------
 Reporter:  alsadi  |       Owner:  olly
     Type:  defect  |      Status:  new 
 Priority:  normal  |   Milestone:      
Component:  Other   |     Version:      
 Severity:  normal  |   Blockedby:      
 Platform:  All     |    Blocking:      
--------------------+-------------------------------------------------------
 I was evaluating the use of xapian to index Arabic documents
 and I noticed that terms are chopped off
 the reason is that chars like  U+0651 ARABIC SHADDA (stress marker)
 which is in Unicode category as "Mark, Non-Spacing"
 are not marked by is_wordchar to be part of the word and thus the word
 would be split

 the patch is trivial

 thanks to  Olly Betts (IRC:ojwb) for helping me on it

-- 
Ticket URL: <http://trac.xapian.org/ticket/355>
Xapian <http://xapian.org/>
Xapian



More information about the Xapian-tickets mailing list