[Xapian-tickets] [Xapian] #46: zero byte cleanliness in C# and Java bindings
Xapian
nobody at xapian.org
Mon Jul 25 13:12:54 BST 2011
#46: zero byte cleanliness in C# and Java bindings
-----------------------------+----------------------------------------------
Reporter: olly | Owner: olly
Type: defect | Status: assigned
Priority: normal | Milestone: 1.2.x
Component: Xapian-bindings | Version: SVN trunk
Severity: minor | Resolution:
Keywords: | Blockedby:
Platform: All | Blocking:
-----------------------------+----------------------------------------------
Old description:
> Current status:
>
> Java (SWIG-based): \0 in Java -> \xc0\x80 in Xapian, which seems to
> disappear from the returned string on output
>
> Tcl: \0 in Tcl -> \xc0\x80 in Xapian, which reappears as \0 in Tcl on
> when returned
>
> C#: Truncates at \0 on input
>
> ----
> ''Original description:''
>
> Check for zero byte cleanness wherever strings are used. There are a
> number of c_str()s in the code, but I believe all in the core library
> are harmless at 2002-04-29. There may be other zero
> byte issues though. xapian-applications/dbtools also uses c_str() where
> it
> should probably use data() and length(). xapian-bindings hasn't been
> checked.
New description:
Current status:
Java (SWIG-based): \0 in Java -> \xc0\x80 in Xapian, which reappears as \0
in Java when returned
Tcl: \0 in Tcl -> \xc0\x80 in Xapian, which reappears as \0 in Tcl when
returned
C#: Truncates at \0 on input
----
''Original description:''
Check for zero byte cleanness wherever strings are used. There are a
number of c_str()s in the code, but I believe all in the core library
are harmless at 2002-04-29. There may be other zero
byte issues though. xapian-applications/dbtools also uses c_str() where
it
should probably use data() and length(). xapian-bindings hasn't been
checked.
--
Comment(by olly):
I must have had an unclean tree or something. SWIG-based Java bindings
are just like Tcl - from the Java side they appear zero-byte clean, but
actually in C++ we see \xc0\x80 for Java \0.
I tried a quick patch to use GetStringCritical() and convert to UTF-8
ourselves using's Xapian's Unicode support, which would mean \0 in Java
<-> \0 in C++, and also would convert surrogate pairs in Java's
representation properly to/from UTF-8 in C++. However, timing some
operations which do a lot of string passing this is twice as slow, so I'm
parking it for now. I'll attach it here so it doesn't get lost.
Added a Java testcase in r15917, which just checks that the roundtripping
works.
--
Ticket URL: <http://trac.xapian.org/ticket/46#comment:21>
Xapian <http://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list