c-preprocessor theory boost-preprocessor turing-complete

Is the C99 preprocessor Turing complete?

After discovering the Boost preprocessor's capabilities I found myself wondering: Is the C99 preprocessor Turing complete?

If not, what does it lack to not qualify?

Solution

Here is an example of abusing the preprocessor to implement a Turing machine. Note that an external build script is needed to feed the preprocessor's output back into its input, so the preprocessor in and of itself isn't Turing complete. Still, it's an interesting project.

From the description of the afore-linked project:

the preprocessor is not Turing complete, at least not if the program is preprocessed only once. This is true even if the program is allowed to include itself. (The reason being that for a given program, the preprocessor has only a finite number of states, plus a stack consisting of the places which the file has been included from. This is only a push-down automaton.)

The answer by Paul Fultz II is quite impressive and certainly closer than I thought the preprocessor could ever get, but it's not a true Turing machine. The C preprocessor has certain limits that prevent it from executing an arbitrary program like a Turing machine could, even if you had infinite memory and time. Section 5.2.4.1 of the C spec gives the following minimum limits for a C compiler:

63 nesting levels of parenthesized expressions within a full expression

63 significant initial characters in an internal identifier or a macro name

4095 macro identifiers simultaneously defined in one preprocessing translation unit

4095 characters in a logical source line

The counter mechanism below requires a macro definition per value, so the macro definition limit will limit how many times you can loop (EVAL(REPEAT(4100, M, ~)) would yield undefined behavior). This essentially puts a cap on the complexity of the program that you can execute. The nesting and complexity of the multi-level expansions may hit one of the other limits as well.

This is fundamentally different than the "infinite memory" limitation. In this case, the spec specifically says that a standards-conforming C compiler is only required to conform to these limits, even if it has infinite time, memory, etc. Any input file exceeding these limits can be processed in an unpredictable or undefined manner (or outright rejected). Some implementations may have higher limits, or no limits at all, but that's considered "implementation-specific" and not part of the standard. It may be possible to use Paul Fultz II's method to implement something like a Turing machine on some specific compiler implementation that has no finite limits, but in a general sense of "can this be done on any arbitrary, standards-conforming C99 pre-processor", the answer is no. Since the limit here is built into the language itself and not simply a side-effect of our inability to construct an infinite computer, I say that breaks Turing completeness.