Search code examples
cstringmacrosscanf

Using a Macro for String Length in scanf() Format Specifier in C


Is it possible to substitute the number 9 with a macro to have more code maintainability in this line of code?

scanf("%9[^\n]s", str);

I tried to read the documentation, but I can't find the exact names of these operations:

  1. "[^\n]s"

  2. "%ns"

I attempted these alternatives, but Clion is marking the first occurrence of str as an error in both lines:

scanf("%" str(MAX_LENGTH) "%[^\n]s", str);

scanf("%" str(MAX_LENGTH) "[^\n]%*c", str);

Solution

  • The reason you get an error is str is not a builtin function nor a predefined macro. You can define str as a macro and use the stringization operator # to perform the substitution, but it is tricky and confusing: str must be defined as a macro that invokes another macro xstr, which in turns stringizes its argument with #x:

    #define xstr(x)  #x
    #define str(x)  xstr(x)
    

    Note however that both of your examples have problems:

    • scanf("%" str(MAX_LENGTH) "%[^\n]s", str); has an extra s at the end of the format, which is useless and indicates a confusion between the %[...] conversion and %s, both of which require a character count prefix to prevent buffer overflow. The second % is also incorrect. Furthermore, you should not use the same identifier str for the macro and the destination array: while not an error, it makes the code unnecessarily confusing. The code should be written:

        char buf[MAX_LENGTH + 1];
        scanf("%" str(MAX_LENGTH) "[^\n]", buf);
      
    • scanf("%" str(MAX_LENGTH) "[^\n]%*c", str); has the correct form but will unconditionally consume the next byte after the match, which is not the newline character if the line has more than MAX_LENGTH bytes before the newline. No indication for this is returned to the caller.

    • %9[^\n] will fail on empty input lines because no characters match the conversion specification. scanf() will return 0 and leave the destination array in an undetermined state.

    Here is a short example:

    #include <stdio.h>
    
    #define MAX_LENGTH  9
    
    #define xstr(x)  #x
    #define str(x)  xstr(x)
    
    int main(void) {
        char buf[MAX_LENGTH + 1];
        if (scanf("%" str(MAX_LENGTH) "[^\n]", buf) == 1) {
            printf("got |%s|\n", buf);
        } else {
            printf("invalid input\n");
        }
        return 0;
    }
    

    If str was defined as #define str(x) #x, invoking str(MAX_LENGTH) would expand to "MAX_LENGTH". The second macro invocation performs its replacement after first expanding the initial macro argument, hence str(MAX_LENGTH) expands to xstr(9), which expands to "9".

    Note also that MAX_LENGTH is not the length of the destination array: you must add an extra character for the null terminator, and there is no consistency check in the macro invocation: the consistency between MAX_LENGTH and the definition of buf rely entirely on the programmer.

    Furthermore, if the definition of MAX_LENGTH is not an integer constant without a suffix, this macro expansion trick will fail to produce a correct scanf conversion specifier.

    A more reliable approach would use snprintf to construct the scanf() format string:

    #include <stdio.h>
    
    #define MAX_LENGTH  9
    
    int main(void) {
        char buf[MAX_LENGTH + 1];
        char format[20];
        snprintf(format, sizeof format, "%%%zu[^\n]", sizeof(buf) - 1);
        if (scanf(format, buf) == 1) {
            printf("got |%s|\n", buf);
        } else {
            printf("invalid input\n");
        }
        return 0;
    }
    

    This version works better but has its own shortcomings: it prevents the compiler from checking the consistency between the format string and the remaining scanf() arguments, which will cause a warning at recommended warning levels (-Wall -Wextra) and this consistency check is quite useful, whereas the format string to construct the format string is easy to get wrong.

    In the end, both approaches are cumbersome and error prone. It is much more reliable to use fgets() for your purpose and manually remove the trailing newline:

    #include <stdio.h>
    #include <string.h>
    
    #define MAX_LENGTH  9
    
    int main(void) {
        char buf[MAX_LENGTH + 2];
        if (fgets(buf, sizeof buf, stdin)) {
            buf[strcspn(buf, "\n")] = '\0';
            printf("got |%s|\n", buf);
        } else {
            printf("no input\n");
        }
        return 0;
    }
    

    The behavior is slightly different: fgets will consume the newline unless the line is too long, which make error recovery more difficult.

    A better solution overall seems to use a custom function:

    #include <stdio.h>
    
    #define MAX_LENGTH  9
    
    /* read a line from a stream and truncate excess characters */
    int get_line(char *dest, int size, FILE *fp) {
        int c;
        int i = 0, j = 0;
    
        while ((c = getc(fp)) != EOF && c != '\n') {
            if (j + 1 < size)
                dest[j++] = c;
            i++;
        }
        if (j < size) {
            dest[j] = '\0';
        }
        return (i == 0 && c == EOF) ? -1 : i;
    }
    
    int main(void) {
        char buf[MAX_LENGTH + 1];
    
        if (get_line(buf, sizeof buf, stdin) == EOF) {
            printf("invalid input\n");
        } else {
            printf("got |%s|\n", buf);
        }
        return 0;
    }
    

    Note that the behavior is still subtly different from the original scanf() call, but potentially closer to your goals:

    • get_line reads a full line, the newline and excess characters are discarded.
    • get_line always stores a C string into the destination array if size is not 0, even at end of file where buf will be an empty string. scanf() would return EOF at end of file and leave buf unchanged.
    • get_line will accept empty lines, whereas scanf() would fail, return 0 and leave buf in an undetermined state, a limitation you probably were not aware of.

    Conclusion: scanf() is full of quirks and pitfalls. Trying to avoid buffer overflows with an explicit character count is a good idea, but scanf() will cause other problems that are not easily handled. Writing custom code is often required to get precise and consistent semantics.