Search code examples
cundefined-behaviorvariadic-functionsc99ansi-c

Is this use of va_copy undefined behaviour?


I made a function that prints pairs of keys and values, where the keys are trusted compile time literal strings which possibly can contain printf specifiers. Now my question is: is this function legal according to the c standard (like c99)? If not, can I do anything to conform?

void pairs(char *message, ...)
{
    va_list args;
    va_start(args, message);

    char *key = va_arg(args, char *);
    if (key == NULL) {
        va_end(args);
        return;
     }

    while (key != NULL) {
        if (key[0] == '%') {
            int len = strlen(key);
            printf("%s=", &key[len + 1]);

            // Copy the current state of the va_list and pass it to vprintf
            va_list args2;
            va_copy(args2, args);
            vprintf(key, args2);
            va_end(args2);

            // Consume one argument assuming vprintf consumed one
            va_arg(args, char *);
        } else {
            // if no format assume it is a string
            fprintf(_log_stream, "%s=\"%s\"", key, va_arg(args, char *));
        }

        key = va_arg(args, char *);
        if (key == NULL)
            break;

        printf(", ");
    }
    va_end(args);
}

It should work something like

    pairs("This is a beautiful value",
            "%d\0hello", 10,
            "pass", user_var,
            "hello", "bye", NULL);

And it actually outputs correctly: hello=10, pass="sus", hello="bye" So it works on my machine (using gcc). But again, is it compliant?

Edit: I found https://www.gnu.org/software/libc/manual/html_node/Variable-Arguments-Output.html

Portability Note: The value of the va_list pointer is undetermined after the call to vprintf, so you must not use va_arg after you call vprintf. Instead, you should call va_end to retire the pointer from service. You can call va_start again and begin fetching the arguments from the start of the variable argument list. (Alternatively, you can use va_copy to make a copy of the va_list pointer before calling vfprintf.) Calling vprintf does not destroy the argument list of your function, merely the particular pointer that you passed to it.

Which is the reason why I use va_copy in the first place, but I am not sure if there is a way to get around that restriction.


Solution

  • Is this use of va_copy undefined behaviour?

    No.

    is this function legal according to the c standard (like c99)?

    No. The:

     key = va_arg(args, char *);
    

    is undefined behavior. The next argument is 10, which is an int.

    (Also, technically, NULL is not a char *. NULL is either an int or void *, no one knows. It is also not valid for NULL. But, nowadays NULL is universally (void *)0, and on sane systems all pointers are the same size, bottom line if you use char * no one will notice. But if you want to be correct, the argument should be (char *)NULL.)

    If not, can I do anything to conform?

    What you want to do in the way you want to do is not possible. You are not able to use va_list after vprintf has used it. After vprintf has used va_list you have to va_end it.

    If you want to do that, you have to parse the format string yourself and get the types of arguments yourself and advance va_list yourself and either implement vprintf yourself or you could call it separately.


    can I do anything

    Och yea, sure, totally. So let's implement a pairs as a variadic macro that just forwards to printf.

    #include <boost/preprocessor/facilities/overload.hpp>
    #include <stdio.h>
    
    #define PAIRS1_0()
    #define PAIRS1_2(a, b)       " " a
    #define PAIRS1_4(a, b, ...)  " " a  PAIRS1_2(__VA_ARGS__)
    #define PAIRS1_6(a, b, ...)  " " a  PAIRS1_4(__VA_ARGS__)
    
    #define PAIRS2_0()
    #define PAIRS2_2(a, b)       b
    #define PAIRS2_4(a, b, ...)  b, PAIRS2_2(__VA_ARGS__)
    #define PAIRS2_6(a, b, ...)  b, PAIRS2_4(__VA_ARGS__)
    
    #define pairs(pre, ...)  \
        printf(  \
            pre  \
            BOOST_PP_OVERLOAD(PAIRS1_, __VA_ARGS__)(__VA_ARGS__) \
            "\n",  \
            BOOST_PP_OVERLOAD(PAIRS2_, __VA_ARGS__)(__VA_ARGS__)  \
        )
    
    int main() {
        pairs("This is a beautiful value",
            "hello=%d", 10,
            "hello=%s", "bye");
    }
    

    The call becomes a single printf with space joined arguments. The first overloaded macro joins the strings into one string, and then the second overloaded macro takes the second arguments. It is a simple shuffle to re-order arguments for printf.

    But this is not satisfactory. You might delve into this more, by using _Generic to guess the format specifier. From the arguments, you can construct an array of "printers" that take va_list by a pointer. Not by value! You can only modify the "upper" function va_list by passing with a pointer. Then you can iterate over printers to print the values, one by one.

    #include <stdio.h>
    #include <stdarg.h>
    
    // printers for pairs
    
    void pairs_int(va_list *va) {
        const int v = va_arg(*va, int);
        printf("%d", v);
    }
    void pairs_charp(va_list *va) {
        char *p = va_arg(*va, char *);
        printf("%s", p);
    }
    
    #define PRINTER(x)  \
        _Generic((x) \
        , int: &pairs_int  \
        , char *: &pairs_charp \
        )
    
    typedef void (*printer_t)(va_list *va);
    
    // the actual logic
    void pairs_in(const char *pre, const printer_t printers[], ...) {
        va_list va;
        va_start(va, printers);
        fputs(pre, stdout);
        for (const printer_t *printer = printers; printer; printer++) {
            putchar(' ');
            fputs(va_arg(va, char*), stdout);
            putchar('=');
            (*printer)(&va);
            fflush(stdout);
        }
        va_end(va);
        putchar('\n');
    }
    
    // Convert arguments into an array of printers
    #define PRINTERS_0()
    #define PRINTERS_2(a, b)       PRINTER(b)
    #define PRINTERS_4(a, b, ...)  PRINTERS_2(a, b), PRINTERS_2(__VA_ARGS__)
    #define PRINTERS_6(a, b, ...)  PRINTERS_2(a, b), PRINTERS_4(__VA_ARGS__)
    #define PRINTERS_N(_6,_5,_4,_3,_2,_1,N,...)  PRINTERS##N
    #define PRINTERS(...)  PRINTERS_N(__VA_ARGS__,_6,_5,_4,_3,_2,_1)(__VA_ARGS__)
    
    // The entrypoint
    #define pairs(pre, ...)  \
        pairs_in(pre, (const printer_t[]){ \
            PRINTERS(__VA_ARGS__), NULL, }\
            __VA_OPT__(,) \
            __VA_ARGS__ \
        )
    
    int main() {
        pairs("This is a beautiful value",
            "value", 10,
            "hello", "bye");
    }
    

    Now you can expand this method, by for example including in the "hello" a format specifier like you wanted, and then extract it. For example "%d\0hello", if the string starts with %, split it on \0 and use the leading part for printing inside the printer. I think this is the interface you aimed for.

    #include <stdarg.h>
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    
    // printers for pairs
    const char *_extract_fmt(const char *text, const char *def) {
            return text[0] == '%' ? text : def;
    }
    
    const char *_extract_name(const char *text) {
            return text[0] == '%' ? text + strlen(text) + 1 : text;
    }
    
    void pairs_int(const char *text, va_list *va) {
            const int v = va_arg(*va, int);
            printf(_extract_fmt(text, "%d"), v);
    }
    void pairs_charp(const char *text, va_list *va) {
            char *v = va_arg(*va, char *);
            printf(_extract_fmt(text, "%s"), v);
    }
    
    #define PRINTER(x) _Generic((x), int: &pairs_int, char *: &pairs_charp)
    
    typedef void (*printer_t)(const char *text, va_list *va);
    
    // the actual logic
    void pairs_in(const char *pre, const printer_t printers[], ...) {
            va_list va;
            va_start(va, printers);
            fputs(pre, stdout);
            for (const printer_t *printer = printers; *printer; printer++) {
                    putchar(' ');
                    char *text = va_arg(va, char *);
                    fputs(_extract_name(text), stdout);
                    putchar('=');
                    (*printer)(text, &va);
                    fflush(stdout);
            }
            va_end(va);
            putchar('\n');
    }
    
    // Convert arguments into an array of printers
    #define PRINTERS_0()
    #define PRINTERS_2(a, b) PRINTER(b)
    #define PRINTERS_4(a, b, ...) PRINTERS_2(a, b), PRINTERS_2(__VA_ARGS__)
    #define PRINTERS_6(a, b, ...) PRINTERS_2(a, b), PRINTERS_4(__VA_ARGS__)
    #define PRINTERS_N(_6, _5, _4, _3, _2, _1, N, ...) PRINTERS##N
    #define PRINTERS(...) PRINTERS_N(__VA_ARGS__, _6, _5, _4, _3, _2, _1)(__VA_ARGS__)
    
    // The entrypoint
    #define pairs(pre, ...) \
            pairs_in(pre, (const printer_t[]){ \
                              PRINTERS(__VA_ARGS__), \
                              NULL, \
                          } __VA_OPT__(, ) __VA_ARGS__)
    
    int main() {
            pairs("This is a beautiful value", "%10d\0value", 10, "hello", "bye");
    }
    

    Or you can expand such a library and start supporting python format strings and so on, which is what I did with now dead yio library, as I do not have time for it.