[Xapian-tickets] [Xapian] #741: "Empty termnames aren't allowed" by indexing text in Arabic
Xapian
nobody at xapian.org
Wed Dec 14 06:06:04 GMT 2016
#741: "Empty termnames aren't allowed" by indexing text in Arabic
-------------------------+-----------------------------
Reporter: Kelson | Owner: olly
Type: defect | Status: assigned
Priority: normal | Milestone: 1.4.2
Component: Library API | Version: 1.4.1
Severity: normal | Resolution:
Keywords: | Blocked By:
Blocking: | Operating System: Linux
-------------------------+-----------------------------
Changes (by olly):
* cc: assem (added)
Comment:
The problematic word in `wrong.txt` consists of a single 'ARABIC TATWEEL'
(U+0640) character, which indeed stems to an empty string.
I'd argue that's a bug in the Arabic stemmer (I've Cc:-ed assem who wrote
that algorithm - what do you think, Assem?)
But we should handle this case better (especially as we support user-
implemented stemming algorithms). At the minimum the error message should
be improved, but I think overall makes sense to just skip empty stems if
they arise.
--
Ticket URL: <https://trac.xapian.org/ticket/741#comment:5>
Xapian <https://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list