[Xapian-tickets] [Xapian] #741: "Empty termnames aren't allowed" by indexing text in Arabic

Xapian nobody at xapian.org
Wed Dec 14 06:06:04 GMT 2016


#741: "Empty termnames aren't allowed" by indexing text in Arabic
-------------------------+-----------------------------
 Reporter:  Kelson       |             Owner:  olly
     Type:  defect       |            Status:  assigned
 Priority:  normal       |         Milestone:  1.4.2
Component:  Library API  |           Version:  1.4.1
 Severity:  normal       |        Resolution:
 Keywords:               |        Blocked By:
 Blocking:               |  Operating System:  Linux
-------------------------+-----------------------------
Changes (by olly):

 * cc: assem (added)


Comment:

 The problematic word in `wrong.txt` consists of a single 'ARABIC TATWEEL'
 (U+0640) character, which indeed stems to an empty string.

 I'd argue that's a bug in the Arabic stemmer (I've Cc:-ed assem who wrote
 that algorithm - what do you think, Assem?)

 But we should handle this case better (especially as we support user-
 implemented stemming algorithms).  At the minimum the error message should
 be improved, but I think overall makes sense to just skip empty stems if
 they arise.

--
Ticket URL: <https://trac.xapian.org/ticket/741#comment:5>
Xapian <https://xapian.org/>
Xapian



More information about the Xapian-tickets mailing list