Search code examples
cpointers

In C language, the first character of a string variable is 0, but when printing, 0 disappears?


I am a beginner of C language. I wrote the following code to convert uppercase letters to lowercase letters and copy them to a new variable. Variable c2 was only declared and initialized, but when printing, the first character 0 of variable c2 disappeared, which made me very confused. The code is as follows:

#include <stdio.h>

void toLowerCase(const char *s, char *c);
int main()
{
  char c1[] = "0Xe8Fa";
  char c2[] = "0Xe8Fa";
  char c3[] = "";

  toLowerCase(c1, c3);

  printf("c1 = %s\n", c1); // c1 = 0Xe8Fa
  printf("c2 = %s\n", c2); // c2 = xe8fa  why zero fly?
  printf("c3 = %s\n", c3); // c3 = 0xe8fa
  return 0;
}

void toLowerCase(const char *s, char *c)
{
  int i;
  i = 0;
  while (s[i] != '\0')
  {
    if (s[i] >= 'A' && s[i] <= 'Z')
    {
      c[i] = s[i] + ('a' - 'A');
    }
    else
    {
      c[i] = s[i];
    }
    ++i;
  }
  c[i] = '\0';
}


I hope to know why such a result occurred


Solution

  • Here's a trick to help you see exactly why it is happening. I added the following line:

        printf("addr of c1=%p, addr of c2=%p, addr of c3=%p\n", c1, c2, c3);
    

    The "%p" prints the actual address of the variable. Here's my output:

    $ gcc x.c
    $ ./a.out
    addr of c1=0x7ffef0c1389a, addr of c2=0x7ffef0c138a1, addr of c3=0x7ffef0c13899
    c1 = xe8fa
    c2 = 0Xe8Fa
    c3 = 0xe8fa
    

    Notice that my output is different than yours. You have c2 being corrupted, while I have c1 being corrupted. Let's look at my results first.

    If you look at those addresses, you'll see that c3 is at the lowest address of the three, ending with 899. Then c1 with address ending with 89a, which is one higher. Then c2 with address ending with 8a1, which is 7 higher. It's interesting that the C compiler did not arrange the variables in memory in the same order that they are declared. But the compiler doesn't HAVE to arrange them in the declared order unless they are members of a structure. So GCC didn't do anything wrong here.

    Here's how I would draw the memory layout of the variables (only showing the last 3 digits of the address):

    c2: 8a1 (? bytes)
    c1: 89a (7 bytes)
    c3: 899 (1 byte)
    

    Given these addresses and the source code, we can infer that the compiler allocated 1 byte for c3 and 7 bytes for c1. The program output doesn't give us direct evidence of c2's allocation size, but we can assume it is 7 bytes also.

    As the program executes, it is copying data from c1 to c3. The first byte, a '0', goes into address 899, which is c3[0]. The second byte, 'X', goes into address 89a, which is c1[0]. And so on. You are writing past the end of c3 because it only has 1 byte allocated to it, and c1 happened to be in memory just past c3, so it got clobbered.

    So, why is your output different than mine? Apparently the same basic thing is happening, but your compiler arranged your variables in a different order. I assume you're using a compiler different from mine (maybe Microsoft?).

    Finally, be aware of the limitations of this trick. Compilers often insert padding (unused space) between variables. Or, suppose that c3 ended up at the highest address. Writing past the end of it would not have touched c1 or c2. Which is to say that printing these pointers can sometimes help understand a bug or unexpected behavior, but you can't count on them. As has already been mentioned, writing past the end of an array is undefined. And with undefined behavior, anything is possible.