Search code examples
ccompilationcompiler-constructionbcpl

How could one possibly bootstrap a C compiler(from source)?


I was looking into compiler bootstrapping, and I looked at how Golang implements bootstrapping from source, i.e., by building the last version of Golang implemented in C and using the generated executable to compile newer Go releases. This made me curious as to how the same could be done with C. Can you construct a C compiler on a computer with literally nothing present on it? If not, then how can I trust that the binary of the compiler I use doesn't automatically fill the binaries it compiles with spyware?

Related question, since the first C compiler was written in B and B was written in BCPL, what was BCPL written in?


Solution

  • Can you construct a C compiler on a computer with literally nothing present on it?

    The main issue is how (in 2021) would you write a program for that computer! And how would you input it?

    In the 1970s computers (like IBM 360 mainframes) had many mechanical switches to enter some initial program. In the 1960s, they had even more, e.g. IBM1620.

    Today, how would you input that initial program? Did you consider using some Arduino ? Even oscilloscopes today contain microprocessors with programs....

    Some hobbyists today have designed (and spent a lot of money) in making - a few years ago - computers with mechanical relays. These are probably thousands times slower than the cheapest laptop computer you could buy (or the micro-controller inside your computer mouse - and your mouse contains some software too).

    You could also buy many discrete transistors (e.g. thousands of 2N2222) and make a computer by soldering them.

    Even a cheap motherboard (like e.g. MSI A320M A-PRO) has today some firmware program called UEFI or BIOS. It is shipped with that program.... and rumored to be mostly written in C (several dozen of thousands of statements).

    In some ways, computer chips are "software" coded in VHDL, SystemC, etc... etc...

    However, you can in principle still bootstrap a C compiler in 2021.

    Here is an hypothetical tale....

    Imagine you have today a laptop running a small Linux distribution on some isolated island (à la Robinson Crusoe), without any Internet connection - but with books (including Modern C and some book about x86-64 assembly and instruction set architecture and many other books in paper form), pencils, papers, food and a lot of time to spend. Imagine that system does not have any C compiler (e.g. because you just removed by mistake the gcc package from some Debian distribution), but just GNU binutils (that is, the linker ld and the assembler gas), some editor in binary form (e.g. GNU emacs or vim), GNU bash and GNU make as binary packages. We assume you are motivated enough to spend months in writing a C compiler. We also assume you have access to man pages in some paper form (notably elf(5) and ld(1)...). We have to assume you can inspect a file in binary form with od(1) and less(1).

    Then you could design on paper a subset µC of the C language in EBNF notation. With months of efforts, you can write a small assembler program, directly doing syscalls(2) (see Linux Assembly HowTo) and interpreting that µC language (since writing an interpreter is easier than writing a compiler; read for example the Dragon book, and Queinnec's Lisp In Small Pieces and Scott's programming language pragmatics book).

    Once you have your tiny µC interpreter, you can write a naive µC compiler in µC (since Fabrice Bellard has been able to write his tinyC compiler).

    Once you have debugged that µC compiler, you can extend it to accept all the syntax and semantics of C.

    Once you have a full C compiler, you could improve it to optimize better, maybe extend it to accept a small subset of C++, and you might also write a static C code analyzer inspired by Frama-C.

    PS. Bootstrapping can be generalized a lot - see Pitrat's blog on bootstrapping artificial intelligence (Jacques Pitrat, born in 1934, died in october 2019) and the RefPerSys project.