[Xapian-discuss] Semantics of terms - do they have to be C style, '\0' terminated strings?
olly at survex.com
Thu Sep 11 07:48:53 BST 2008
On Wed, Sep 10, 2008 at 11:21:53PM -0700, David Spencer wrote:
> Is it defined anywhere whether the string you pass to add_term has to be "C
> style" or not?
All handling of "data" strings in the C++ API is zero-byte clean.
The C# and Java bindings aren't currently though.
And the quartz and flint backends internally do some messing around with
zero bytes in terms, which essentially means that each zero byte in a
term counts twice towards the term length limit at the moment. But the
limit is a bit over 240, so that's rarely an issue.
> At a glance the code base doesn't call c_str() much and the cases I saw had
> to do with filenames, so this might be OK,
Yes, filenames can't contain zero bytes, and OS/library calls take
nul-terminated strings as const char * (or similar), so calling c_str()
in such cases isn't a problem.
More information about the Xapian-discuss