Search code examples
cscanfoverflowfgetsgets

In which cases is scanf safe in terms of overflow? And in which cases must it necessarily be replaced by another function such as fgets?


It is usually said that scanf is not a safe function. Clang and GCC do not issue any warnings, but MSVC does not even compile (unless you include _CRT_SECURE_NO_WARNINGS):

Error C4996 'scanf': This function or variable may be unsafe.

  • Does this mean that in some cases scanf is guarantee to not overflow and in others not?
  • If so, which exactly are these cases?

Another function also for reading data is gets

  • Are there cases where gets can be used safely or should it be avoided altogether?

It is usually suggested as an alternative to scanf and gets the use of fgets.

  • How is fgets more secure?

Solution

  • Functions scanf and gets for string processing

    The scanf function is safe for string processing because there is a specific field to delimit the length of the string. This is shown in the following example.

    // Example 01   
    #include <stdio.h>
    #define SIZE 7
    
    int main(void)
    {
        char city[SIZE];
        printf("Insert the name of your city: ");    // Columbus
        scanf("%6s", city);
        printf("The city is: %s", city);             // Columb
    
        return 0;
    }
    
    /* ## Output ##
     * Insert the name of your city: Columbus
     * The city is: Columb
     */
    

    If the user enters for the name of the city Columbus, a buffer overflow will not occur, since scanf will limit itself to trying to store only the first 6 characters of the string in city, according to the %6s instruction (in addition to inserting at the end \0, which is the null character: reference). Therefore, when the result is shown on the screen, it appears Columb as the name of the city.

    The drawback is that the string length limit cannot be entered as an argument directly, unlike printf. More details in the Annex.

    String processing can also be performed by the function gets. However, gets does not have any delimiter fields and will read until it finds a newline or the end of the file (EOF). Rewriting the previous example for gets:

    // Example 02   
    #include <stdio.h>
    #define SIZE 7
    
    int main(void)
    {
        char city[SIZE];
        printf("Insert the name of the city: ");    // Columbus
        gets(city);
        printf("The city is: %s", city);            // ???
    
        return 0;
    }
    
    /* ## Possible Output ##
     * Insert the name of the city: Columbus
     * The city is: Columbus
     */
    

    Gets tries to store the complete string in city, which is not possible, after all, city does not support a string of 8 characters. In the tested case, gets invaded adjacent memory addresses to write the part of the string that could not be stored in city, resulting in a buffer overflow. If a string is long enough, it is expected that in addition to a buffer overflow, it will also causes a segmentation fault (more details here and here). Buffer overflow is one of the main vulnerabilities exploited by hackers and therefore special attention should be paid to this issue (video: buffer overflow attack. Text: Buffer Overflow Exploitation). Thus, due to the lack of a field that defines the length of the string to be stored, it is impossible to read strings safely with gets (and reading strings is the only function of gets). Therefore, gets should never be used and has been completely removed from the language as of C11.

    Functions scanf and fgets for arithmetic data processing

    In addition to strings, scanf also reads arithmetic data (integer and floating point values). However, for this case, scanf is not safe, and there is no guarantee protection against undefined behavior. The following code illustrates this. Since the C standard specifies only the absolute minimum value of integer types, the value of LONG_MIN and LONG_MAX are implementation-dependent, but it is mandatory that LONG_MIN <= -2147483647 and LONG_MAX >= +2147483647).

    // Example 03    
    #include <stdio.h>
    #include <limits.h>
    #include <errno.h>
    
    #define SIZE 100
    
    int main(void) {
    
        long a;
        char buffer[SIZE];
    
        printf("Enter a number: ");            // 2147483648 (LONG_MAX + 1)
        int success = scanf("%ld", &a);
        printf("a = %ld", a);
        getchar();
    
        printf("\nEnter a number: ");          // 2147483648
        fgets(buffer, SIZE, stdin);
        long b = strtol(buffer, NULL, 10);
        if (b == LONG_MAX && errno == ERANGE) {
            printf("b: Overflow!\n");
        }
        else if (b == LONG_MIN && errno == ERANGE) {
            printf("b: Underflow!\n");
        }
        printf("b = %ld", b);
    
        return 0;
    }
    
    /* ## Possible Output ##
     * Enter a number:  2147483648
     * a = -2147483648
     * Enter a number: 2147483648
     * b: Overflow!
     * b = 2147483647
     */
    

    The user enters a sufficiently large number that long is unable to store (read Note). Scanf reads the number and returns 1 (the return of scanf indicates the number of values ​​successfully assigned). However, overflow happens and according to the C Standard integer overflow results in undefined behavior. In the test carried out with the example, to the variable a was assigned the value -2147483648. This indicates that there was what is known as wraparound. However, scanf does not allow testing integer overflows. The situation can be mitigated by seeking to impose a limit on the value read. Considering a long where LONG_MAX is +2147483647, it is possible to impose a limit by writing scanf("%9ld", number). Note that a value with 10 digits (%10ld) would already open room for overflow (+9 999 999 999 > +2 147 483 647). However, imposing the limit of 9 digits, what happens is that there is a range of numbers that are valid (long is able to store), but the code excludes from the possibilities. On the other hand, fgets offers protection. First, in the code, fgets(buffer, SIZE, stdin) limits the value read, preventing the occurrence of a buffer overflow, which could be critical. Next, strtol performs the conversion to long: long b = strtol(buffer, NULL, 10). It is not possible to store the value in the long type, so strtol:

    1. Returns the largest possible integer: LONG_MAX. With this, it avoids the occurrence of a overflow of the variable b.
    2. Sets the errno flag to ERANGE indicating that an error has occurred, specifically a value processed with excessively large magnitude.

    It is worthy noting that scanf does not set errno, preventing a similar strategy from being adopted in scanf.

    Note that fgets logic is safe. Even if an overflow-based attack is attempted, all behaviors are well defined. There will be no buffer overflow and no integer overflow of variable b, which will necessarily be in its validity range.

    If it's a float type, the situation is more subtle. According to IEEE 754, if a number is too large to be stored in a float type, it must be assigned to the variable the special value inf or -inf (IEEE 754 topic 7.4 and covered here [topic 2 Overflow and underflow] and here [topic: 2.3.2 Overflow]). However, this is not in the C standard. Therefore, a compiler may or may not reproduce the behavior described in IEEE 754. If there is compliance with IEEE 754, the behavior of scanf, when reading an excessively large number, will be to assign to the variable of type float the special value inf or -inf. This is defined behavior and, in this sense, safe. This is illustrated in the code below.

    // Example 04
    #include <stdio.h>
    #include <math.h>
    
    int main(void) {
    
        float a;
        printf("Enter a number: ");               // 2E40
        int success = scanf("%f", &a);            // 1
    
        if (isinf(a)) {
            printf("Underflow or Overflow!\n");   // Underflow or Overflow!
        }
        printf("a = %f", a);                      // inf
    
        return 0;
    }
    
    /* ## Possible Output ##
     * Enter a number 2E40
     * Underflow or Overflow!
     * a = inf
     */
    

    This code has been tested on MSVC, Clang, GCC and TCC. In all cases, was assigned to the variable a special value inf. However, C compilers are not required to comply with IEEE 754 and therefore scanf is not safe for storing floating point numbers.

    On the other hand, the fgets strategy involves two sequential operations:

    1. fgets stores the value read as a string in a array of chars
    2. strtof converts the string to float

    The first operation is safe, as show in example 3. The second operation is also safe, since its behavior is determined by the C standard. If the value converted by strtof is outside the valid range, then HUGE_VALF is returned (reference). With this, there is the certainty of a defined behavior.

    Therefore, the processing of floating point values by fgets strategy is safe. The code below is fgets version of Example 4.

    // Example 05
    #include <stdio.h>
    #include <math.h>
    #include <stdlib.h>
    #define SIZE 50    
    
    int main(void) {
    
        char buffer[SIZE];
        float a;
        printf("Enter a number: ");           // 2E40        
        fgets(buffer, SIZE, stdin);
        buffer[strcspn(buffer, "\n")] = 0;    // remove '\n'    
        a = strtof(buffer, NULL);
    
        if (isinf(a)) {
            printf("Underflow or Overflow!\n");
        }
        printf("a = %f", a);                  // +inf
    
        return 0;
    }
    
    /* ## Output ##
     * Enter a number: 2E40
     * Underflow or Overflow!
     * a = inf
     */
    

    Function fgets for processing strigs

    In addition to arithmetic data processing, fgets can also be used for string processing, as a substitute of scanf. The first example with scanf can be adapted to an alternative version with fgets.

    // Example 06
    #include <stdio.h>
    #define SIZE 7
    
    int main(void) {
        char city[SIZE];
        printf("Insert the name of the city: ");   // Columbus
        fgets(city, SIZE, stdin);
        printf("The city is: %s", city);           // Columb
    
        return 0;
    }
    
    /* ## Output ##
     * Insert the name of the city: Columbus
     * The city is: Columb
     */
    

    Like scanf, fgets also provides buffer overflow protection processing strings. However, unlike scanf, in fgets the maximum value for the number of characters read can be inserted directly as an argument and in this case was inserted through SIZE (more information in the Annex).

    Annex

    With printf it is possible to insert the value for the delimiter field through an argument. The following example illustrates this:

    // Example 07
    #include <stdio.h>
    #define SIZE 6
    
    int main(void) {
    
        char country[20] = "Canada";
        printf("%.*s \n", SIZE, country);    // Canada
        printf("%.6s \n", country);          // Canada
    
        return 0;
    }
    
    /* ## Output ##
     * Canada
     * Canada
     */
    

    For scanf the only direct strategy is analogous to the second printf. This is a disadvantage since the delimiter field through an argument, unlike the "hardcoded" strategy, allows to easily work with cases where the value comes from:

    1. A variable from another file
    2. A user-entered argument

    and it is still convenient if used in multiple printf.

    Note: scanf can receive the value for the delimiter field through andargument, but not directly. Details here and here.

    Note

    A buffer overflow can be defined as an invasion of memory regions not belonging to the variable. In an integer overflow (or floating overflow) this invasion does not necessarily happen. When referring to this type of overflow, it is alluded to the attempt to assign a value to a variable that is unable to store such a value due to its excessive magnitude. For an integer overflow, this configures undefined behavior, which could result in a buffer overflow (memory intrusion), a wraparound, etc.

    Additional Topic:

    Particularities of scanf, gets and fgets processing strings

    The default behavior of the function scanf is to stop reading at the first whitespace found (reference). However, the whitespace is left in the input buffer. So the following code is not correct:

    // Example 08
    #include <stdio.h>
    #define SIZE 20
    
    int main() {
    
        char city[SIZE];
        char state[SIZE];
    
        printf("City: ");            // Columbus  
        scanf("%19s", city);
        printf("State: ");
        fgets(state, 20, stdin);
    
        return 0;
    }
    
    /* ## Possible Output ##
     * City: Columbus
     * State:
     */
    

    What happens is that scanf reads the city entered by the user and leaves in the input buffer \n. With that, fgets reads the rest of the input buffer (\n) and stores it in the variable state. As a result, the user cannot enter the state. To correct this code it is necessary to insert getchar after each scanf. However, this protection fails if the user enters the following sequence for the variable city: Columbus space enter. The program returns to the initial problem: getchar will remove space from the buffer, but the newline will remain in the buffer. An alternative to solve this problem is to replace getchar with while ((c = getchar()) != EOF && c != '\n'). This will clear from the input buffer everything after the last character processed by scanf until it finds a newline or the end of the file. So for this solution, each getchar would be replaced by:

    int c;
    while ((c = getchar()) != EOF && c != '\n');
    

    For both cases, fgets already intrinsically provides the necessary protection. That's because fgets stops reading only when it finds a newline, the end of file or when it reaches the maximum number of characters, whichever comes first (reference). In this case, the newline is what happens first and it is included in the associated variable (in this case, city or state) and removed from the input buffer. Similarly, gets reads until a newline or the end of the file is encountered. If a new line is found, it is included in the associated variable (reference).

    Finally, several particularities of scanf and fgets are presented here.

    Conclusion

    Does this mean that in some cases scanf is guarantee to not overflow and in others not?

    Yes. String processing can be safely performed by scanf. However, when processing integer or floating point values there is no guarantee.

    Are there cases where gets can be used safely or should it be avoided altogether?

    It should be avoided completely. The gets function is safe only in environments where limits are imposed on stdin, which is a very specific case.

    How is fgets more secure?

    The function fgets provides security for both string and arithmetic data processing (integer and floating point).