Unit test C-generating python code

I have a project which involves some (fairly simple) C-code generation as part of a build system. In essence, I have some version information associated with a project (embedded C) which I want to expose in my binary, so that I can easily determine what firmware version was programmed to a particular device for debugging purposes.

I'm writing some simplistic python tools to do this, and I want to make sure they're thoroughly tested. In general, this has been fairly straightforward, but I'm unsure what the best strategy is for the code-generation portion. Essentially, I want to make sure that the generated files both:

Are syntactically correct
Contain the necessary information

The second, I can (I believe) achieve to a reasonable degree with regex matching. The first, however, is something of a bigger task. I could probably use something like pycparser and examine the resulting AST to accomplish both goals, but that seems like an unnecessarily heavyweight solution.

Edit: A dataflow diagram of my build hierarchy

dataflow diagram

Solution

Thanks for the diagram! Since you are not testing for coverage, if it were me, I would just compile the generated C code and see if it worked :) . You didn't mention your toolchain, but in a Unix-like environment, gcc <whatever build flags> -c generated-file.c || echo 'Oops!' should be sufficient.

Now, it may be that the generated code isn't a freestanding compilation unit. No problem there: write a shim. Example shim.c:

#include <stdio.h>
#include "generated-file.c"
main() { 
    printf("%s\n", GENERATED_VERSION);  //or whatever is in generated-file.c
}

Then gcc -o shim shim.c && diff <(./shim) "name of a file holding the expected output" || echo 'Oops!' should give you a basic test. (The <() is bash process substitution.) The file holding the expected results may already be in your git repo, or you might be able to use your Python routine to write it to disk somewhere.

Edit 2 This approach can work even if your actual toolchain isn't amenable to automation. To test syntactic validity of your code, you can use gcc even if you are using a different compiler for your target processor. For example, compiling with gcc -ansi will disable a number of GNU extensions, which means code that compiles with gcc -ansi is more likely to compile on another compiler than is code that compiles with full-on, GNU-extended gcc. See the gcc page on "C Dialect Options" for all the different flavors you can use (ditto C++).

Edit Incidentally, this is the same approach GNU autoconf uses: write a small test program to disk (autoconf calls it conftest.c), compile it, and see if the compilation succeeded. The test program is (preferably) the bare minimum necessary to test if everything is OK. Depending on how complicated your Python is, you might want to test several different aspects of your generated code with respective, different shims.