c winapi gcc compiler-construction glibc

C compiler reference implementation

There are multiple languages that have a reference implementation of their compiler/libraries, but why doesn't C have a reference implementation?
I know that GCC and glibc are used extensively and Microsoft has their version that they also use, but why not one main implementation like for example python?(Yes I know there are other implementations, but there is one MAIN/Reference python)
Does it have to do with the fact that OS's like linux and Windows implement at least part of their API in C? Thank you.

Solution

This is really more of a historical than a technical question, I suppose. The short answer is: there was one, of a sort. It didn't take. The long answer is rather longer, depending on how much detail you want to go into.

C is old. I mean, python is also old, but C is really old. Dennis Ritchie hacked the first versions of it together at Bell Labs in the late 1960s, and he did it so that he wouldn't have to write UNIX in assembly or B (which is a now-forgotten systems programming language of the time that had some shortcomings C was made to address).

This is arguably a completely different approach from language design today: C was written for the purpose of writing UNIX. Nobody sat down and designed C for the sake of designing a pretty and clean systems programming language; these guys wanted to write an operating system and built a tool that allowed them to do this more easily.

However, this Bell Labs C compiler was a sort of reference implementation in that C was essentially what the C team wrote into their compiler. Then came the Portable C compiler that was supposed to make porting C to new platforms easier, and the book titled "The C Programming Language" (aka the informal K&R C specification), and C became popular, and all was well. And then things became complicated.

C had become popular at a time when it still had...let's be charitable and say "some room for improvement." Of course it had; every language always has room for improvement. There was no real standard library, for starters. Function arguments were not checked at compile time, functions could not return void or structs, that sort of thing. It was still rough around the edges.

But now there was not just Bell Labs, oh no. Now there were dozens of vendors, all with their adaptations of pcc or homegrown compilers and lots of great and sometimes not-so-great ideas about the ways in which C could be improved. They, too, were less interested in designing a language than in designing a tool that would make their actual jobs simpler. So they wrote their own extensions to the language, and sometimes these didn't mesh well with the extensions that other people came up with.

Now, it's easy to facepalm at this point and ask why they didn't just coordinate the language development better, but it really wasn't that simple. The Internet didn't exist yet, so they couldn't just set up an IRC channel to discuss stuff, and also...well, programming languages weren't the only thing that was messy compared to today.

Today, most computers are fairly similar. We all represent negative integers as two's complement, bytes are very nearly always 8 bits wide, pointers are simply memory addresses. This was not the case at the time, and when you consider that when C was standardized, there were still machines around that used one's complement or signed-magnitude, you get an idea why signed overflow is undefined in C. Have you ever seen the C code for an old DOS program? They had this concept of near and far pointers because the old 16-bit x86 computers needed special segmentation registers to address more than 64KB of RAM. Several compilers built C extensions for this, but believe me, you're very, very glad that C today does not include this concept. The Soviets built a balanced ternary computer, although I'm unsure whether it had C support. In short, the hardware landscape was also messy, and this is kind of a big deal for a language that's close to the metal.

So, everybody did what they had to and generally (though not always) the best they could, but the language necessarily diverged. A language core was eventually standardized in 1989 (when the Undertaker...hold on, wrong year) to bring back some semblance of order, and some years after this compilers began to converge on it. Nevertheless, some of the old extensions will never go away because backwards compatibility is always an issue -- consider how quickly python 3 was adopted -- and there are some close-to-the-metal issues that need to be addressed for the language to be useful but cannot sensibly be written into the spec because they are not portable, such as calling conventions.

And there you have it. The reason that C has a language specification rather than a reference implementation is mostly historical and partly due to subtleties of the different machines on which it has to run.

I suppose it would be possible to develop an official reference implementation (at least for a few common platforms), but I also believe that its value would be somewhat limited. After all, the C standard has to leave a number of things undefined because it cannot know the exact nature of the underlying machine, so all other implementations would only behave like the reference implementation as long as the code you feed into it is well-formed. For well-formed code, the usual C implementations (i.e. gcc, clang, MSVC) generally behave the same way, so you can use any one of them.