Search code examples
programming-languages

Is it possible to create a universal intermediate programming language?


What I mean is, is there a language or could one be designed, such that all high level programming languages could be compiled into this intermediate language?

This is excluding machine languages.


Solution

  • Every general-purpose language that is Turing-complete is a universal programming language.

    Two languages (or machines) are considered to be Turing-equivalent if any program for one can be compiled into a program for the other. A language is Turing-complete if it is Turing-equivalent to a Turing machine.

    There were several early efforts to formalize the notion of a computation; a Turing machine was one, the lambda calculus another, and the class of general recursive functions a third. Alonzo Church and Alan Turing proved that all three of these formalizations were Turing-equivalent; any program for a Turing machine could be compiled to the lambda calculus, and vice versa, as could any general recursive function be implemented by either the lambda calculus or a Turing machine, and again vice versa.

    The Church-Turing thesis hypothesizes that any computation that can be expressed in any formal system can be converted into a program that can run on a Turing machine; or equivalently, can be expressed in the untyped lambda calculus, or is general recursive, based on the equivalence described above.

    It is merely a hypothesis and cannot be formally proven, as there is no way to formally characterize the class of computations that are subject to it (without circular reasoning by defining them as the class of computations that can be performed by a Turing machine), but there has never been any proposed model of computation that is not possible to compute with a Turing machine.

    Because you can write a simulator of a Turing machine (or implementation of lambda calculus) in almost any general purpose language, and likewise those languages can be compiled to a program running on a Turing machine, pretty much all general purpose languages are Turing complete.

    There are, however, some languages which are not Turing complete; regular expressions are an example. They can be simulated by a Turing machine, but they cannot in turn simulate a Turing machine.

    Note that none of this addresses efficiency or access to host system resources; merely that the same computation can be expressed, and that it will eventually provide the same answer. There are some languages that are Turing complete in which there are some problems that cannot be computed at the same asymptotic efficiency as in other languages. And some languages provide access to external resources like the filesystem, I/O, networking, etc, while others which just allow computation in memory, but in any language that is Turing complete it would be possible to add an API or method of manipulating memory that allows it to access those external resources, so lack of access to system resources isn't a fundamental limitation, just a limitation of implementation.

    As a more practical matter, there are several languages that have been designed to be portable, intermediate languages that are targets of compilation. The LLVM IR is one commonly used example, C-- is another. Also, any bytecode for a language runtime acts this way, the JVM is a compilation target for many languages, the CLR is another. Finally, many languages compile to C, as C compilers are widely available and the code is more portable than machine code.

    And more recently, with the advent of the web and JavaScript being a language that is available in every web browser, JavaScript has become a popular target for compilation, both for languages that were designed to compile down to JavaScript like CoffeeScript and Dart, but also existing languages that were originally design to compile to machine code, via projects like Emscripten. Recognizing this usage, there has been effort to specify a subset of JavaScript, with more strict rules, known as asm.js, that makes a better target for compilation, while still allowing the same code to work backwards-compatibly with regular JavaScript engines that don't know anything about asm.js.