[Xapian-discuss] Semantics of terms - do they have to be C style, '\0' terminated strings?

Olly Betts olly at survex.com
Thu Sep 11 07:48:53 BST 2008


On Wed, Sep 10, 2008 at 11:21:53PM -0700, David Spencer wrote:
> Is it defined anywhere whether the string you pass to add_term has to be "C
> style" or not?

All handling of "data" strings in the C++ API is zero-byte clean.

The C# and Java bindings aren't currently though.

And the quartz and flint backends internally do some messing around with
zero bytes in terms, which essentially means that each zero byte in a
term counts twice towards the term length limit at the moment.  But the
limit is a bit over 240, so that's rarely an issue.

> At a glance the code base doesn't call c_str() much and the cases I saw had
> to do with filenames, so this might be OK,

Yes, filenames can't contain zero bytes, and OS/library calls take
nul-terminated strings as const char * (or similar), so calling c_str()
in such cases isn't a problem.

Cheers,
    Olly



More information about the Xapian-discuss mailing list