Search code examples
oopmathprogramming-languagesmodelalgebra

Is there any Mathematical Model or Theory behind Programming Languages?


RDBMS are based on Relational Algebra as well as Codd's Model. Do we have something similar to that for Programming languages or OOP?


Solution

  • Do we have [an underlying model] for programming languages?

    Heavens, yes. And because there are so many programming languages, there are multiple models to choose from. Most important first:

    • Church's untyped lambda calculus is a model of computation that is as powerful as a Turing machine (no more and no less). The famous "Church-Turing hypothesis" is that these two equivalent models represent the most general model of computation that we know how to implement. The lambda calculus is extremely simple; in its entirety the language is

      e ::= x | e1 e2 | \x.e
      

      which constitute variables, function applications, and function definitions. The lambda calculus also comes with a fairly large collection of "reduction rules" for simplifying expressions. If you find an expression that can't be reduced, that is called a "normal form" and represents a value.

      The lambda calculus is so general that you can take it in several directions.

      • If you want to use all the available rules, you can write specialized tools like partial evaluators and parts of compilers.

      • If you avoid reducing any subexpression under a lambda, but otherwise use all the rules available, you wind up with a model of a lazy functional language like Haskell or Clean. In this model, if a reduction can terminate, it is guaranteed to, and it is easy to represent infinite data structures. Very powerful.

      • If you avoid reducing any subexpression under a lambda, and if you also insist on reducing each argument to a normal form before a function is applied, then you have a model of an eager functional language like F#, Lisp, Objective Caml, Scheme, or Standard ML.

    • There are also several flavors of typed lambda calculi, of which the most famous are grouped under the name System F, which were discovered independently by Girard (in logic) and by Reynolds (in computer science). System F is an excellent model for languages like CLU, Haskell, and ML, which are polymorphic but have compile-time type checking. Hindley (in logic) and Milner (in computer science) discovered a restricted form of System F (now called the Hindley-Milner type system) which makes it possible to infer System F expressions from some expressions of the untyped lambda calculus. Damas and Milner developed an algorithm do this inference, which is used in Standard ML and has been generalized in other languages.

    • Lambda calculus is just pushing symbols around. Dana Scott's pioneering work in denotational semantics showed that expressions in the lambda calculus actually correspond to mathematical functions—and he identified which ones. Scott's work is especially important in making sense of "recursive definitions", which are commonplace in computer science but are nonsensical from a mathematical point of view. Scott and Christopher Strachey showed that a recursive definition is equivalent to the least defined solution to a recursion equation, and furthermore showed how that solution could be constructed. Any language that allows recursion, and especially languages that allow recursion at arbitrary type (like Haskell and Clean) owes something to Scott's model.

    • There is a whole family of models based on abstract machines. Here there is not so much an individual model as a technique. You can define a language by using a state machine and defining transitions on the machine. This definition encompasses everything from Turing machines to Von Neumann machines to term-rewriting systems, but generally the abstract machine is designed to be "as close to the language as possible." The design of such machines, and the business of proving theorems about them, comes under the heading of operational semantics.

    What about object-oriented programming?

    I'm not as well educated as I should be about abstract models used for OOP. The models I'm most familiar with are very closely connected to implementation strategies. If I wanted to investigate this area further I would start with William Cook's denotational semantics for Smalltalk. (Smalltalk as a language is very simple, almost as simple as the lambda calculus, so it makes a good case study for modeling more complicated object-oriented languages.)

    Wei Hu reminds me that Martin Abadi and Luca Cardelli have put together an ambitious body of work on foundational calculi (analogous to the lambda calculus) for object-oriented languages. I don't understand the work well enough to summarize it, but here is a passage from the Prologue of their book, which I feel is worth quoting:

    Procedural languages are generally well understood; their constructs are by now standard, and their formal underpinnings are solid. The fundamental features of these languages have been distilled into formalisms that prove useful in identifying and explaining issues of implementation, static analysis, semantics, and verification.

    An analogous understanding has not yet emerged for object-oriented languages. There is no widespread agreement on a collection of basic constructs and on their properties... This situation might improve if we had a better understanding of the foundations of object-oriented languages.

    ... we take objects as primitive and concentrate on the intrinsic rules that objects should obey. We introduce object calculi and develop a theory of objects around them. These object calculi are as simple as function calculi, but represent objects directly.

    I hope this quotation gives you an idea of the flavor of the work.