[Xapian-tickets] [Xapian] #446: TermGenerator: Strange handling of '+' within a word
Xapian
nobody at xapian.org
Thu Feb 11 21:59:27 GMT 2010
#446: TermGenerator: Strange handling of '+' within a word
--------------------+-------------------------------------------------------
Reporter: cworth | Owner: olly
Type: defect | Status: new
Priority: normal | Milestone:
Component: Other | Version: 1.1.3
Severity: normal | Blockedby:
Platform: All | Blocking:
--------------------+-------------------------------------------------------
I asked the TermGenerator to generate terms for a string containing
" xapian+kanru ". I was surprised to see the result as the following
two terms:
xapian+
kanru
I did note that the documentation[1] of the term-generator says that
"trailing +" is included on a term. But the handling of the above
seems inconsistent. It appears that the embedded '+' is first treated
as a non-word character to split the string into "xapian+" and "kanru"
and then the '+' is identified as trailing, so is considered a
word-character to yield "xapian+".
I expected the embedded '+' to be treated consistently as a non-word
character here, (it's not a trailing +), so the desired result would
be the two terms "xapian" and "kanru".
As always, thanks for Xapian!
-Carl
[1] http://xapian.org/docs/termgenerator.html
PS. The above documentation has phrases like "a few other characters"
in some places. I would love to see those replaced with lists of the
actual characters so that I could predict correct results by reading
the documentation.
--
Ticket URL: <http://trac.xapian.org/ticket/446>
Xapian <http://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list