[Xapian-tickets] [Xapian] #741: "Empty termnames aren't allowed" by indexing text in Arabic

Xapian nobody at xapian.org
Sat Dec 17 05:01:47 GMT 2016


#741: "Empty termnames aren't allowed" by indexing text in Arabic
-------------------------+-----------------------------
 Reporter:  Kelson       |             Owner:  olly
     Type:  defect       |            Status:  assigned
 Priority:  normal       |         Milestone:  1.4.2
Component:  Library API  |           Version:  1.4.1
 Severity:  normal       |        Resolution:
 Keywords:               |        Blocked By:
 Blocking:               |  Operating System:  Linux
-------------------------+-----------------------------

Comment (by olly):

 {{{
 $ perl -CO -e 'print
 "A\x{60c}B\x{61b}C\x{61f}D\x{66a}E\x{66b}F\x{66c}G"'|examples/simpleindex
 ar.db
 $ xapian-delve ar.db -r1
 Term List for record #1: Za Zb Zc Zd Ze Zf Zg a b c d e f g
 }}}

 So Xapian currently splits up words at all those characters (`simpleindex`
 is currently hard-wired to use the English stemmer, so it's not that the
 Arabic stemmer is stripping them).

--
Ticket URL: <https://trac.xapian.org/ticket/741#comment:11>
Xapian <https://xapian.org/>
Xapian



More information about the Xapian-tickets mailing list