Search code examples
c

Need assistance understanding C code about newlines


This question references Reflections on Trusting Trust, figure 2.

Take a look at this snippet of code, from figure 2:

...
c = next( );
if(c != '\\')
    return(c);
c = next( );
if (c != '\\')
    return('\\');
if (c == 'n')
    return('\n');

It says:

This is an amazing piece of code. It "knows" in a completely portable way what character code is compiled for a new line in any character set. The act of knowing then allows it to recompile itself, thus perpetuating the knowledge.

I would like to read the rest of the paper. Can someone explain how the above code is recompiling itself? I'm not sure I understand how this snippet of code relates to the code in "Stage 1":

Stage 1
(source: bell-labs.com)


Solution

  • The stage 2 example is very interesting because it is an extra level of indirection with a self replicating program.

    What he means is that since this compiler code is written in C it is completely portable because it detects the presence of a literal \n and returns the character code for \n without ever knowing what that actual character code is since the compiler was written in C and compiled for the system.

    The paper goes on to show you very interesting trojan horse with the compiler. If you use this same technique to make the compiler insert a bug into any program, then remove move the bug from the source code, the compiler will compile the bug into the supposedly bug free compiler.

    It is a bit confusing but essentially it is about multiple levels of indirection.