Search code examples
c++arraysstringcharstrlen

Where C++ really stores a string if the char array that stores it is smaller than a string is?


I'm testing an example about strings in C++ from "C++ Premiere" book.

const int size = 9;
char name1[size];
char name2[size] = "C++owboy";   // 8 characters here

cout << "Howdy! I'm " << name2 << "! What's your name?" << endl;

cin >> name1;  // I input "Qwertyuiop" - 11 chars. It is more than the size of name1 array;

// now I do cout
cout << "Well, your name has " << strlen(name1) << " letters";  // "Your name has 11 letters".
cout << " and is stored in an array of " << size(name1) << " bytes"; // ...stored in an array of 9 bytes.

How it can be that 11 chars are stored in an array just for 8 chars + '\0' char? Is it becomes wider on compilation? Or the string is stored somewhere else?

Also, I can't do:

const int size = 9;
char name2[size] = "C++owboy_12345";   // assign 14 characters to 9 chars array

But can do what I've written above:

cin >> name1;   // any length string into an array of smaller size

What is the trick here? I use NetBeans and Cygwin g++ compiler.


Solution

  • This is a typical buffer overflow. This is why you're always supposed to check the size of input if you're putting it in a buffer. Here is what's happening:

    In C++ (and C), array names are just pointers to the first element of the array. The compiler knows the size of the array and will do some compile-time checks. But, during runtime, it'll just treat it as a char*.

    When you did cin >> name1, you passed a char* to cin. cin doesn't know how big the allocated space is -- all it has is a pointer to some memory. So, it assumes you allocated enough space, writes everything, and goes past the end of the array. Here's a picture:

    Bytes   1  2  3  4  5  6  7  8  9  10 11 12 13 14 15
    Before  |-----name1 array-------|  |--- other data-|
    After   Q  w  e  r  t  y  u  i  o  p  \0 |-er data-|
    

    As you can see, you have overwritten other data that was stored after the array. Sometimes this other data is just junk, but other times it's important and could mean a tricky bug. Not to mention, this is a security vulnerability, because an attacker can overwrite program memory with user input.

    The confusion about sizes is because strlen will count bytes until it finds a '\0' (null terminator), meaning it finds 10 characters. On the other hand size(name1) uses the actual size of the array, provided by the compiler.

    Because of these problems, whenever you see a C function that takes an array as an argument, it also takes the array size. Otherwise there's no way of telling how big it is. To avoid these problems, it's much better to use C++ objects like std::string.