[Xapian-tickets] [Xapian] #741: "Empty termnames aren't allowed" by indexing text in Arabic
Xapian
nobody at xapian.org
Sat Dec 17 05:01:47 GMT 2016
#741: "Empty termnames aren't allowed" by indexing text in Arabic
-------------------------+-----------------------------
Reporter: Kelson | Owner: olly
Type: defect | Status: assigned
Priority: normal | Milestone: 1.4.2
Component: Library API | Version: 1.4.1
Severity: normal | Resolution:
Keywords: | Blocked By:
Blocking: | Operating System: Linux
-------------------------+-----------------------------
Comment (by olly):
{{{
$ perl -CO -e 'print
"A\x{60c}B\x{61b}C\x{61f}D\x{66a}E\x{66b}F\x{66c}G"'|examples/simpleindex
ar.db
$ xapian-delve ar.db -r1
Term List for record #1: Za Zb Zc Zd Ze Zf Zg a b c d e f g
}}}
So Xapian currently splits up words at all those characters (`simpleindex`
is currently hard-wired to use the English stemmer, so it's not that the
Arabic stemmer is stripping them).
--
Ticket URL: <https://trac.xapian.org/ticket/741#comment:11>
Xapian <https://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list