Search code examples
c++stringc-stringsstring-lengthstrlen

Difference between strlen(str.c_str()) and str.length() for std::string


As an implicit understanding, I always thought that every implementation of std::string necessarily must satisfy strlen(str.c_str()) == str.length() for every string str.

What does the C++ standard say about this? (Does it?)

Background: At least the implementations shipped with Visual C++ and gcc do not have this property. Consider this example (see here for a live example):

// Output:
// string says its length is: 13
// strlen says: 5
#include <iostream>
#include <cstring>
#include <string>

int main() {
  std::string str = "Hello, world!";
  str[5] = 0;
  std::cout << "string says its length is: " << str.length() << std::endl;
  std::cout << "strlen says: " << strlen(str.c_str()) << std::endl;
  return 0;
}

Of course, the writing operation without str noticing is causing "the problem". But that's not my question. I want to know what the standard has to say about this behavior.


Solution

  • Your understanding is incorrect. Sort of.

    std::string may contain chars with the value '\0'; when you extract a C-string, you have no way of knowing how long it is other than to scan for \0s, which by necessity cannot account for "binary data".

    This is a limitation of strlen, and one that std::string "fixes" by actually remembering this metadata as a count of chars that it knows are encapsulated.

    The standard doesn't really need to "say" anything about it, except that std::string::length gives you the string length, no matter what the value of the chars you inserted into the string, and that is it not prohibited to insert a '\0'. By contrast, strlen is defined to tell you how many chars exist up to the next \0, which is a fundamentally different definition.

    There is no explicit wording about this, because there does not need to be. If there were an exception to the very simple rules ("there is a string; it has chars; it can tell you how many chars it has") then that would be stated explicitly… and it's not.