[Xapian-tickets] [Xapian] #446: TermGenerator: Strange handling of '+' within a word

Xapian nobody at xapian.org
Thu Feb 11 21:59:27 GMT 2010


#446: TermGenerator: Strange handling of '+' within a word
--------------------+-------------------------------------------------------
 Reporter:  cworth  |       Owner:  olly 
     Type:  defect  |      Status:  new  
 Priority:  normal  |   Milestone:       
Component:  Other   |     Version:  1.1.3
 Severity:  normal  |   Blockedby:       
 Platform:  All     |    Blocking:       
--------------------+-------------------------------------------------------
 I asked the TermGenerator to generate terms for a string containing
 " xapian+kanru ". I was surprised to see the result as the following
 two terms:

         xapian+
         kanru

 I did note that the documentation[1] of the term-generator says that
 "trailing +" is included on a term. But the handling of the above
 seems inconsistent. It appears that the embedded '+' is first treated
 as a non-word character to split the string into "xapian+" and "kanru"
 and then the '+' is identified as trailing, so is considered a
 word-character to yield "xapian+".

 I expected the embedded '+' to be treated consistently as a non-word
 character here, (it's not a trailing +), so the desired result would
 be the two terms "xapian" and "kanru".

 As always, thanks for Xapian!

 -Carl

 [1] http://xapian.org/docs/termgenerator.html

 PS. The above documentation has phrases like "a few other characters"
 in some places. I would love to see those replaced with lists of the
 actual characters so that I could predict correct results by reading
 the documentation.

-- 
Ticket URL: <http://trac.xapian.org/ticket/446>
Xapian <http://xapian.org/>
Xapian



More information about the Xapian-tickets mailing list