I've seen several usage of fgets
(for example, here) that go like this:
char buff[7]="";
(...)
fgets(buff, sizeof(buff), stdin);
The interest being that, if I supply a long input like "aaaaaaaaaaa", fgets
will truncate it to "aaaaaa" here, because the 7th character will be used to store '\0'
.
However, when doing this:
int i=0;
for (i=0;i<7;i++)
{
buff[i]='a';
}
printf("%s\n",buff);
I will always get 7 'a'
s, and the program will not crash. But if I try to write 8 'a'
s, it will.
As I saw it later, the reason for this is that, at least on my system, when I allocate char buff[7]
(with or without =""
), the 8th byte (counting from 1, not from 0) gets set to 0. From what I guess, things are done like this precisely so that a for
loop with 7 writes, followed by a string formatted read, could succeed, whether the last character to be written was '\0'
or not, and thus avoiding the need for the programmer to set the last '\0' himself, when writing chars individually.
From this, it follows that in the case of
fgets(buff, sizeof(buff), stdin);
and then providing a too long input, the resulting buff
string will automatically have two '\0'
characters, one inside the array, and one right after it that was written by the system.
I have also observed that doing
fgets(buff,(sizeof(buff)+17),stdin);
will still work, and output a very long string, without crashing. From what I guessed, this is because fgets
will keep writing until sizeof(buff)+17
, and the last char to be written will precisely be a '\0'
, ensuring that any forthcoming string reading process would terminate properly (although the memory is messed up anyway).
But then, what about fgets(buff, (sizeof(buff)+1),stdin);
? this would use up all the space that was rightfully allocated in buff
, and then write a '\0'
right after it, thus overwriting...the '\0'
previously written by the system. In other words, yes, fgets
would go out of bounds, but it can be proven that when adding only one to the length of the write, the program will never crash.
So in the end, here comes the question: why does fgets
always terminates its write with a '\0'
, when another '\0'
, placed by the system right after the array, already exists? why not do like in the one by one for
-loop based write, that can access the whole of the array and write anything the programmer wants, without endangering anything?
Thank you very much for your answer!
EDIT: indeed, there is no proof possible, as long as I do not know whether this 8th '\0'
that mysteriously appears upon allocation of buff[7], is part of the C standard or not, specifically for string arrays. If not, then...it's just luck that it works :-)
but it can be proven that when adding only one to the length of the write, the program will never crash.
No! You can't prove that! Not in the sense of a mathematical proof. You have only shown that on your system, with your compiler, with those particular compiler settings you used, with particular environment configuration, it might not crash. This is far from a mathematical proof!
In fact the C standard itself, although it guarantees that you can get the address of "one place after the last element of an array", it also states that dereferencing that address (i.e. trying to read or write from that address) is undefined behaviour.
That means that an implementation can do everything in this case. It can even do what you expect with naive reasoning (i.e. work - but it's sheer luck), but it may also crash or it may also format your HD (if your are very, very unlucky). This is especially true when writing system software (e.g. a device driver or a program running on the bare metal), i.e. when there is no OS to shield you from the nastiest consequences of writing bad code!
Edit This should answer the question made in a comment (C99 draft standard):
7.19.7.2 The fgets function
Synopsis
#include <stdio.h> char *fgets(char * restrict s, int n, FILE * restrict stream);
Description
The fgets function reads at most one less than the number of characters specified by n from the stream pointed to by stream into the array pointed to by s. No additional characters are read after a new-line character (which is retained) or after end-of-file. A null character is written immediately after the last character read into the array.
Returns
The fgets function returns s if successful. If end-of-file is encountered and no characters have been read into the array, the contents of the array remain unchanged and a null pointer is returned. If a read error occurs during the operation, the array contents are indeterminate and a null pointer is returned.
Edit: Since it seems that the problem lies in a misunderstanding of what a string is, this is the relevant excerpt from the standard (emphasis mine):
7.1.1 Definitions of terms
A string is a contiguous sequence of characters terminated by and including the first null character. The term multibyte string is sometimes used instead to emphasize special processing given to multibyte characters contained in the string or to avoid confusion with a wide string. A pointer to a string is a pointer to its initial (lowest addressed) character. The length of a string is the number of bytes preceding the null character and the value of a string is the sequence of the values of the contained characters, in order.