My simple C program is as follows. Initially, I've defined variable buf1
with 3 char.
I don't have any problem with 2 char such as AB
or XY
user@linux:~/c# cat buff.c; gcc buff.c -o buff; echo -e '\n'; ./buff
#include <stdio.h>
#include <string.h>
int main() {
char buf1[3] = "AB";
printf("buf1 val: %s\n", buf1);
printf("buf1 addr: %p\n", &buf1);
strcpy(buf1,"XY");
printf("buf1 val: %s\n", buf1);
}
buf1 val: AB
buf1 addr: 0xbfe0168d
buf1 val: XY
user@linux:~/c#
Unfortunately, when I add 3 char such as XYZ
, I'm getting the following error message when compiling the program.
buff.c:8:2: warning: ‘__builtin_memcpy’ writing 4 bytes into a region of size 3 overflows the destination [-Wstringop-overflow=]
strcpy(buf1,"XYZ");
Isn't XYZ
considered as 3 bytes? Why does the error message said 4 bytes
instead of 3 bytes
user@linux:~/c# cat buff.c; gcc buff.c -o buff; echo -e '\n'; ./buff
#include <stdio.h>
#include <string.h>
int main() {
char buf1[3] = "AB";
printf("buf1 val: %s\n", buf1);
printf("buf1 addr: %p\n", &buf1);
strcpy(buf1,"XYZ");
printf("buf1 val: %s\n", buf1);
}buff.c: In function ‘main’:
buff.c:8:2: warning: ‘__builtin_memcpy’ writing 4 bytes into a region of size 3 overflows the destination [-Wstringop-overflow=]
strcpy(buf1,"XYZ");
^~~~~~~~~~~~~~~~~~
buf1 val: AB
buf1 addr: 0xbfdb34fd
buf1 val: XYZ
Segmentation fault
user@linux:~/c#
You're forgetting that C strings are null-terminated. The sizeof "AB"
is 3 and sizeof "XYZ"
is 4, due to the implicit terminating byte. (The type of the string literal "AB"
is char[3]
and the type of "XYZ"
is char[4]
.)
Had you not specified any length for buf1
, it would also had been sized 3 bytes long:
char buf1[] = "AB"; // here exactly the same as char buf1[3] = "AB";
The memory layout would be
buf1
v
+-------+-------+-------+
| [0] | [1] | [2] |
+-------+-------+-------+
| 'A' | 'B' | '\0' |
+-------+-------+-------+
Now, strcpy
copies the terminating null character (C11 7.24.2.3p2):
- The
strcpy
function copies the string pointed to bys2
(including the terminating null character) into the array pointed to bys1
. If copying takes place between objects that overlap, the behavior is undefined.
which means that 4 bytes are copied in total, but there are space for only 3 characters, therefore the code has undefined behaviour and the compiler produces the diagnostics messages. C11 7.1.4 Use of library functions p.2:
[...] If a function argument is described as being an array, the pointer actually passed to the function shall have a value such that all address computations and accesses to objects (that would be valid if the pointer did point to the first element of such an array) are in fact valid.[...]
In the actual code the implicit access to the buf1[3]
is in fact not valid.
Memory layout after strcpy
:
buf1
v
+-------+-------+-------+-------+
| [0] | [1] | [2] | ??? |
+-------+-------+-------+-------+
| 'X' | 'Y' | 'Z' | '\0' |
+-------+-------+-------+-------+
The reason why the warning comes from __builtin_memcpy
is because the C compiler heavily optimized this code - it replaced the strcpy
of a string of known length with memcpy
of known length as memcpy
would be generating more efficient code.
And finally, you can fit 3 characters into char buf1[3];
by using strncpy
, but the buffer cannot fit the terminating null character, and therefore it cannot be printed using printf("%s")
, but you can print it with specifying explicit field width that is less than or equal to the length of the array - however the printed out value would be padded:
#include <stdio.h>
#include <string.h>
int main() {
char buf1[3] = "AB";
printf("buf1 val: >%-3s<\n", buf1);
printf("buf1 addr: %p\n", &buf1);
strncpy(buf1, "XYZ", 3);
printf("buf1 val: >%-3s<\n", buf1);
}
And compiling, running it:
% gcc strncpy.c -Wall -Wextra
% ./a.out
buf1 val: >AB <
buf1 addr: 0x7ffd7f6aecc5
buf1 val: >XYZ<
but there is one extra space character printed after AB