[Xapian-tickets] [Xapian] #785: Use std::string_view whenever possible

Xapian nobody at xapian.org
Sun Nov 3 03:33:31 GMT 2019


#785: Use std::string_view whenever possible
-------------------------+-------------------------
 Reporter:  Kronuz       |             Owner:  olly
     Type:  enhancement  |            Status:  new
 Priority:  normal       |         Milestone:
Component:  Other        |           Version:
 Severity:  normal       |        Resolution:
 Keywords:               |        Blocked By:
 Blocking:               |  Operating System:  All
-------------------------+-------------------------

Comment (by olly):

 > I’ll try to profile this to see where string copy constructors and
 string copy assignments are called from within xapian during a real world
 usage and report back the results.

 Did you get anywhere with that?

 Thinking about this some more, the cases where this is likely to help are
 where strings can be long '''and''' currently get copied to store them
 internally, but I can't think of any examples of that.  We'd also need to
 have a copy of the string elsewhere with at least as long a lifetime.

 There are places which takes a potentially long string and process it, but
 we almost always pass strings by const reference, so passing a string_view
 is unlikely to be much different (if you pass a const reference to a
 string_view it's really no different, which if you pass by value it avoids
 a level of indirection but requires passing a structure rather than a
 pointer).

 And a lot of strings are very short - e.g. most common terms are short
 enough to be stored inline via SSO so the amount of copying is typically
 just `sizeof(std::string)`.  Both libstd++ and libc++ implement SSO,
 though somewhat differently, and even the shortest SSO limit of 10 bytes
 is enough for most terms:

 On x86-64:

 * with libstdc++ `sizeof(std::string)` is 32 and SSO is used for <= 15
 bytes
 * with libc++ `sizeof(std::string)` is 24 and SSO is used for <= 22 bytes
 * `sizeof(std::string_view)` is presumably 16 bytes, but requires the
 string is already stored elsewhere

 On x86:

 * with libstdc++ `sizeof(std::string)` is 24 and SSO is used for <= 15
 bytes
 * with libc++ `sizeof(std::string)` is 12, and it looks like SSO is used
 for <= 10 bytes
 * `sizeof(std::string_view)` is presumably 8 bytes, but requires the
 string is already stored elsewhere

 So libc++'s implementation is clearly cleverer space-wise, though there
 may be time costs to the more complex encoding scheme it seems to use.

 We could provide overloaded forms for API methods like
 `TermGenerator::index_text()` which accepted `std::string_view` for
 convenience, and only declare them in the API header based on the value of
 `__cplusplus`.  That case at least is really just syntactic sugar for:

 {{{#!cpp
 termgenerator.index_text(Xapian::Utf8Iterator(strview.data(),
 strview.size()));
 }}}

 So there's really no runtime efficiency improvement to be had there.
 Other cases which currently require a `std::string` might deliver a speed-
 up, but may need more internal replumbing.

--
Ticket URL: <https://trac.xapian.org/ticket/785#comment:6>
Xapian <https://xapian.org/>
Xapian



More information about the Xapian-tickets mailing list