Search code examples
cshellcharacter-encodingcode-generationcode-injection

How can I sanitize user input into valid C-String literals?


I'm trying to use a shell script to generate C-code for wrapping executables.

This needs to work on Linux and MacOS, and have as few dependencies as possible. I don't care about Windows (other than WSL2)

#include <unistd.h>
#include <stdlib.h>

int main(int argc, char **argv) {
    putenv("X=1");
    putenv("HELLO=WORLD")
    argv[0] = "/usr/bin/python3";
    return execv("/usr/bin/python3", argv);
}

Naive approach:

# make-c-wrapper.sh EXECUTABLE ARGS
#
# ARGS:
# --argv0       NAME    : set name of executed process to NAME
#                         (defaults to EXECUTABLE)
# --set         VAR VAL : add VAR with value VAL to the executable’s
#                         environment

echo "#include <unistd.h>\n#include <stdlib.h>\n\nint main(int argc, char **argv) {"
executable="$1"
params=("$@")

for ((n = 1; n < ${#params[*]}; n += 1)); do
    p="${params[$n]}"
    if [[ "$p" == "--set" ]]; then
        key="${params[$((n + 1))]}"
        value="${params[$((n + 2))]}"
        n=$((n + 2))
        echo "    putenv(\"$key=$value\");"
    elif [[ "$p" == "--argv0" ]]; then
        argv0="${params[$((n + 1))]}"
        n=$((n + 1))
    else
        # Using an error macro, we will make sure the compiler gives an understandable error message
        echo "    #error make-c-wrapper.sh did not understand argument $p"
    fi
done

echo "    argv[0] = \"${argv0:-$executable}\";\n    return execv(\"$executable\", argv);\n}"

But this fails if you try to supply special characters in the input:

./make-c-wrapper /usr/bin/python3 --set "Hello" "This is\"\na test"
#include <unistd.h>
#include <stdlib.h>

int main(int argc, char **argv) {
    putenv("Hello=This is"
a test");
    argv[0] = "/usr/bin/python3";
    return execv("/usr/bin/python3", argv);
}

What I would have liked to see here is this:

#include <unistd.h>
#include <stdlib.h>

int main(int argc, char **argv) {
    putenv("Hello=This is\"\na test");
    argv[0] = "/usr/bin/python3";
    return execv("/usr/bin/python3", argv);
}

According to this answer: https://stackoverflow.com/a/12208808/8008396, it seems like I need to escape the following characters to make sure the result is a valid C string literal: ", \, \r, \n, \0 and \?.

Is there an easy way to do this? And it needs to work on MacOS, not just Linux.


Solution

  • it seems like I need to escape the following characters to make sure the result is a valid C string literal: ", \, \r, \n, \0 and \?.

    You need to escape ", \, and newline. While you're at it, it makes sense to escape the carriage return. Although there is an escape sequence for ?, that character can also represent itself. Null characters in your input are not representable as elements of a string literal, and your shell probably doesn't handle them in variable values, either, so you would probably be best off not giving them any special consideration.

    Shell parameter expansion syntax has a substring replacement feature and a C-like literal syntax that you could leverage. The shell quoting gets a little involved, but for example, this ...

    escape_string_literal() {
        result=${1//'\'/'\\'}
        result=${result//\"/'\"'}
        result=${result//$'\n'/'\n'}
        result=${result//$'\r'/'\r'}
    }
    
    escape_string_literal '"boo\"'
    echo "${result}"
    

    ... prints

    \"boo\\\"
    

    Do note, however, that you're not necessarily clear to include all other characters in your string literals. In particular, even though there are no single-character escapes for most of them, other control characters might or might not be accepted as literal characters, depending on your C implementation.