It is usually said that scanf
is not a safe function. Clang and GCC do not issue any warnings, but MSVC does not even compile (unless you include _CRT_SECURE_NO_WARNINGS):
Error C4996 'scanf': This function or variable may be unsafe.
scanf
is guarantee to not overflow and in others not?Another function also for reading data is gets
gets
can be used safely or should it be avoided altogether?It is usually suggested as an alternative to scanf
and gets
the use of fgets
.
fgets
more secure?scanf
and gets
for string processingThe scanf
function is safe for string processing because there is a specific field to delimit the length of the string. This is shown in the following example.
// Example 01
#include <stdio.h>
#define SIZE 7
int main(void)
{
char city[SIZE];
printf("Insert the name of your city: "); // Columbus
scanf("%6s", city);
printf("The city is: %s", city); // Columb
return 0;
}
/* ## Output ##
* Insert the name of your city: Columbus
* The city is: Columb
*/
If the user enters for the name of the city Columbus
, a buffer overflow will not occur, since scanf
will limit itself to trying to store only the first 6 characters of the string in city
, according to the %6s
instruction (in addition to inserting at the end \0
, which is the null character: reference). Therefore, when the result is shown on the screen, it appears Columb
as the name of the city.
The drawback is that the string length limit cannot be entered as an argument directly, unlike printf
. More details in the Annex.
String processing can also be performed by the function gets
. However, gets
does not have any delimiter fields and will read until it finds a newline or the end of the file (EOF). Rewriting the previous example for gets
:
// Example 02
#include <stdio.h>
#define SIZE 7
int main(void)
{
char city[SIZE];
printf("Insert the name of the city: "); // Columbus
gets(city);
printf("The city is: %s", city); // ???
return 0;
}
/* ## Possible Output ##
* Insert the name of the city: Columbus
* The city is: Columbus
*/
Gets
tries to store the complete string in city
, which is not possible, after all, city
does not support a string of 8 characters. In the tested case, gets
invaded adjacent memory addresses to write the part of the string that could not be stored in city
, resulting in a buffer overflow. If a string is long enough, it is expected that in addition to a buffer overflow, it will also causes a segmentation fault (more details here and here). Buffer overflow is one of the main vulnerabilities exploited by hackers and therefore special attention should be paid to this issue (video: buffer overflow attack. Text: Buffer Overflow Exploitation). Thus, due to the lack of a field that defines the length of the string to be stored, it is impossible to read strings safely with gets
(and reading strings is the only function of gets
). Therefore, gets
should never be used and has been completely removed from the language as of C11.
scanf
and fgets
for arithmetic data processingIn addition to strings, scanf
also reads arithmetic data (integer and floating point values). However, for this case, scanf
is not safe, and there is no guarantee protection against undefined behavior. The following code illustrates this. Since the C standard specifies only the absolute minimum value of integer types, the value of LONG_MIN
and LONG_MAX
are implementation-dependent, but it is mandatory that LONG_MIN <= -2147483647
and LONG_MAX >= +2147483647
).
// Example 03
#include <stdio.h>
#include <limits.h>
#include <errno.h>
#define SIZE 100
int main(void) {
long a;
char buffer[SIZE];
printf("Enter a number: "); // 2147483648 (LONG_MAX + 1)
int success = scanf("%ld", &a);
printf("a = %ld", a);
getchar();
printf("\nEnter a number: "); // 2147483648
fgets(buffer, SIZE, stdin);
long b = strtol(buffer, NULL, 10);
if (b == LONG_MAX && errno == ERANGE) {
printf("b: Overflow!\n");
}
else if (b == LONG_MIN && errno == ERANGE) {
printf("b: Underflow!\n");
}
printf("b = %ld", b);
return 0;
}
/* ## Possible Output ##
* Enter a number: 2147483648
* a = -2147483648
* Enter a number: 2147483648
* b: Overflow!
* b = 2147483647
*/
The user enters a sufficiently large number that long
is unable to store (read Note). Scanf
reads the number and returns 1 (the return of scanf
indicates the number of values successfully assigned). However, overflow happens and according to the C Standard integer overflow results in undefined behavior. In the test carried out with the example, to the variable a
was assigned the value -2147483648
. This indicates that there was what is known as wraparound. However, scanf
does not allow testing integer overflows. The situation can be mitigated by seeking to impose a limit on the value read. Considering a long
where LONG_MAX
is +2147483647
, it is possible to impose a limit by writing scanf("%9ld", number)
. Note that a value with 10 digits (%10ld
) would already open room for overflow (+9 999 999 999
> +2 147 483 647
). However, imposing the limit of 9 digits, what happens is that there is a range of numbers that are valid (long
is able to store), but the code excludes from the possibilities. On the other hand, fgets
offers protection. First, in the code, fgets(buffer, SIZE, stdin)
limits the value read, preventing the occurrence of a buffer overflow, which could be critical. Next, strtol
performs the conversion to long
: long b = strtol(buffer, NULL, 10)
. It is not possible to store the value in the long
type, so strtol
:
LONG_MAX
. With this, it avoids the occurrence of a overflow of the variable b
.errno
flag to ERANGE
indicating that an error has occurred, specifically a value processed with excessively large magnitude.It is worthy noting that scanf
does not set errno
, preventing a similar strategy from being adopted in scanf
.
Note that fgets
logic is safe. Even if an overflow-based attack is attempted, all behaviors are well defined. There will be no buffer overflow and no integer overflow of variable b
, which will necessarily be in its validity range.
If it's a float
type, the situation is more subtle. According to IEEE 754, if a number is too large to be stored in a float type, it must be assigned to the variable the special value inf
or -inf
(IEEE 754 topic 7.4 and covered here [topic 2 Overflow and underflow] and here [topic: 2.3.2 Overflow]). However, this is not in the C standard. Therefore, a compiler may or may not reproduce the behavior described in IEEE 754. If there is compliance with IEEE 754, the behavior of scanf
, when reading an excessively large number, will be to assign to the variable of type float
the special value inf
or -inf
. This is defined behavior and, in this sense, safe. This is illustrated in the code below.
// Example 04
#include <stdio.h>
#include <math.h>
int main(void) {
float a;
printf("Enter a number: "); // 2E40
int success = scanf("%f", &a); // 1
if (isinf(a)) {
printf("Underflow or Overflow!\n"); // Underflow or Overflow!
}
printf("a = %f", a); // inf
return 0;
}
/* ## Possible Output ##
* Enter a number 2E40
* Underflow or Overflow!
* a = inf
*/
This code has been tested on MSVC, Clang, GCC and TCC. In all cases, was assigned to the variable a
special value inf
. However, C compilers are not required to comply with IEEE 754 and therefore scanf
is not safe for storing floating point numbers.
On the other hand, the fgets
strategy involves two sequential operations:
fgets
stores the value read as a string in a array of charsstrtof
converts the string to floatThe first operation is safe, as show in example 3. The second operation is also safe, since its behavior is determined by the C standard. If the value converted by strtof
is outside the valid range, then HUGE_VALF
is returned (reference). With this, there is the certainty of a defined behavior.
Therefore, the processing of floating point values by fgets
strategy is safe. The code below is fgets
version of Example 4.
// Example 05
#include <stdio.h>
#include <math.h>
#include <stdlib.h>
#define SIZE 50
int main(void) {
char buffer[SIZE];
float a;
printf("Enter a number: "); // 2E40
fgets(buffer, SIZE, stdin);
buffer[strcspn(buffer, "\n")] = 0; // remove '\n'
a = strtof(buffer, NULL);
if (isinf(a)) {
printf("Underflow or Overflow!\n");
}
printf("a = %f", a); // +inf
return 0;
}
/* ## Output ##
* Enter a number: 2E40
* Underflow or Overflow!
* a = inf
*/
fgets
for processing strigsIn addition to arithmetic data processing, fgets
can also be used for string processing, as a substitute of scanf
. The first example with scanf
can be adapted to an alternative version with fgets
.
// Example 06
#include <stdio.h>
#define SIZE 7
int main(void) {
char city[SIZE];
printf("Insert the name of the city: "); // Columbus
fgets(city, SIZE, stdin);
printf("The city is: %s", city); // Columb
return 0;
}
/* ## Output ##
* Insert the name of the city: Columbus
* The city is: Columb
*/
Like scanf
, fgets
also provides buffer overflow protection processing strings. However, unlike scanf
, in fgets
the maximum value for the number of characters read can be inserted directly as an argument and in this case was inserted through SIZE
(more information in the Annex).
With printf
it is possible to insert the value for the delimiter field through an argument. The following example illustrates this:
// Example 07
#include <stdio.h>
#define SIZE 6
int main(void) {
char country[20] = "Canada";
printf("%.*s \n", SIZE, country); // Canada
printf("%.6s \n", country); // Canada
return 0;
}
/* ## Output ##
* Canada
* Canada
*/
For scanf
the only direct strategy is analogous to the second printf
. This is a disadvantage since the delimiter field through an argument, unlike the "hardcoded" strategy, allows to easily work with cases where the value comes from:
and it is still convenient if used in multiple printf
.
Note: scanf
can receive the value for the delimiter field through andargument, but not directly. Details here and here.
A buffer overflow can be defined as an invasion of memory regions not belonging to the variable. In an integer overflow (or floating overflow) this invasion does not necessarily happen. When referring to this type of overflow, it is alluded to the attempt to assign a value to a variable that is unable to store such a value due to its excessive magnitude. For an integer overflow, this configures undefined behavior, which could result in a buffer overflow (memory intrusion), a wraparound, etc.
scanf
, gets
and fgets
processing stringsThe default behavior of the function scanf
is to stop reading at the first whitespace found (reference). However, the whitespace is left in the input buffer. So the following code is not correct:
// Example 08
#include <stdio.h>
#define SIZE 20
int main() {
char city[SIZE];
char state[SIZE];
printf("City: "); // Columbus
scanf("%19s", city);
printf("State: ");
fgets(state, 20, stdin);
return 0;
}
/* ## Possible Output ##
* City: Columbus
* State:
*/
What happens is that scanf
reads the city entered by the user and leaves in the input buffer \n
. With that, fgets
reads the rest of the input buffer (\n
) and stores it in the variable state
. As a result, the user cannot enter the state
. To correct this code it is necessary to insert getchar
after each scanf
. However, this protection fails if the user enters the following sequence for the variable city
: Columbus space enter. The program returns to the initial problem: getchar
will remove space from the buffer, but the newline will remain in the buffer. An alternative to solve this problem is to replace getchar
with while ((c = getchar()) != EOF && c != '\n')
. This will clear from the input buffer everything after the last character processed by scanf
until it finds a newline or the end of the file. So for this solution, each getchar
would be replaced by:
int c;
while ((c = getchar()) != EOF && c != '\n');
For both cases, fgets
already intrinsically provides the necessary protection. That's because fgets
stops reading only when it finds a newline, the end of file or when it reaches the maximum number of characters, whichever comes first (reference). In this case, the newline is what happens first and it is included in the associated variable (in this case, city
or state
) and removed from the input buffer. Similarly, gets
reads until a newline or the end of the file is encountered. If a new line is found, it is included in the associated variable (reference).
Finally, several particularities of scanf and fgets are presented here.
Does this mean that in some cases scanf is guarantee to not overflow and in others not?
Yes. String processing can be safely performed by scanf
. However, when processing integer or floating point values there is no guarantee.
Are there cases where gets can be used safely or should it be avoided altogether?
It should be avoided completely. The gets
function is safe only in environments where limits are imposed on stdin
, which is a very specific case.
How is fgets more secure?
The function fgets
provides security for both string and arithmetic data processing (integer and floating point).