Search code examples
ccompiler-constructionintermediate-language

Questions about C as an intermediate language


I'm writing a language that compiles to C right now, and when I say IL I mean as in C is the language I write the code as to then generate assembly by another c compiler, e.g. gcc or clang.

The C code I generate, will it be more beneficial to:

  • If I do some simple opt passes (constant propagation, dead code removal, ...) will this reduce the amount of work the C compiler has to do, or make it harder because it's not really human C code?
  • If I were to compile to say three-address code or SSA or some other form and then feed this into a C program with functions, labels, and variables - would that make it easier or harder for the C compiler to optimize?

Which kind of link together to form the following question...

  • What is the most optimal way to produce good C code from a language that compiles to C?
  • Is it worth doing any optimisations at all and leaving that to the compiler?

Solution

  • Generally there's not much point doing peephole type optimisations because the C compiler will simply do those for you. What is expensive is a) wasted or unnecessary "gift-wrapping" operations, b) memory accesses, c) branch mispredictions.

    For a), make sure you're not passing data about too much, because whilst C will do constant propagation, there's a limit to how far it can detect that two buffers are in fact aliases of the same underlying data. For b) try to keep functions short and operations on the same data together, also limit heap memory use to improve cache performance. For c), the compiler understand for loops, it doesn't understand goto loops. So it will figure that

    for(i=0;i<N;i++) 
    

    will usually take the loop body, it wont figure that

    if(++i < N) goto do_loop_again 
    

    will usually take the jump.

    So really the rule is to make your automatic code as human-like as possible. Though if it's too human-like, that raises the question of what your language has to offer that C doesn't - the whole point of a non-C language is to create a spaghetti of gotos in the C source, a nice structure in the input script.