haskell types infinite typechecking type-systems

How does Haskell type-check infinite recursive values?

Define this data type:

data NaturalNumber = Zero | S NaturalNumber
    deriving (Show)

In Haskell (compiled using GHC), this code will run without warning or error:

infinity = S infinity
inf1 = S inf2
inf2 = S inf1

So both recursive and mutually recursive infinitely deep values pass the type check.

However, the following code gives an error:

j = S 'h'

The error states Couldn't match expected type ‘NaturalNumber’ with actual type 'Char'. The (same) error persists even if I set

j = S (S (S (S ... (S 'h')...)))

with a hundred or so nested S's.

How can Haskell tell that infinity is a valid member of NaturalNumber but j is not?

Interestingly, it also allows:

bottom = bottom
k = S bottom

Does Haskell merely try to prove incorrectness of a program, and if it fails to do so then allows it? Or is Haskell's type system not Turing complete, so if it allows the program then the program is provably (at the type level) correct?

(If the type system (in the formal semantics of Haskell, instead of only the type checker) is Turing complete, then it will either fail to realize some correctly typed programs are correct or some incorrectly typed programs are incorrect, due to the undecidability of the halting problem.)

Solution

Well

S :: NaturalNumber -> NaturalNumber

infinity = S infinity

We start by assuming nothing: we assign infinity some unsolved type _a and try to figure out what it is. We know that we have applied S to infinity, so _a must be whatever is on the left side of the arrow in the constructor’s type, which is NaturalNumber. We know that infinity is the result of an application of S, so infinity :: NaturalNumber, again (if we got two conflicting types for this definition, we’d have to emit a type error).

Similar reasoning holds for the mutually recursive definitions. inf1 must be a NaturalNumber because it appears as an argument to S in inf2; inf2 must be a NaturalNumber because it is the result of S; etc.

The general algorithm is to assign definitions unknown types (notable exceptions are literals and constructors), and then to create constraints on those types by seeing how every definition is used. E.g. this must be some form of list because it’s being reversed, and this must be an Int because it’s being used to look up a value from an IntMap, etc.

In the case of

oops = S 'a'

'a' :: Char as it’s a literal, but, also, we must have 'a' :: NaturalNumber because it’s used as an argument to S. We get the obviously bogus constraint that the type of the literal must both be Char and NaturalNumber, which causes a type error.

And in

bottom = bottom

We start with bottom :: _a. The only constraint is _a ~ _a, because a value of type _a (bottom) is being used where a value of type _a is expected (on the RHS of the definition of bottom). Since there is nothing to further constrain the type, the unsolved type variable is generalized: it gets bound by a universal quantifier to produce bottom :: forall a. a.

Note how both uses of bottom above have the same type (_a) while inferring the type of bottom. This breaks polymorphic recursion: every occurrence of a value within its definition is taken to be of the same type as the definition itself. E.g.

-- perfectly balanced binary trees
data Binary a = Leaf a | Branch (Binary (a, a))
-- headB :: _a -> _r
headB (Leaf x) = x -- _a ~ Binary _r; headB :: Binary _r -> _r
headB (Branch bin) = fst (headB bin)
-- recursive call has type headB :: Binary _r -> _r
-- but bin :: Binary (_r, _r); mismatch

So you need a type signature:

headB :: {-forall a.-} Binary a -> a
headB (Leaf x) = x
headB (Branch bin) = fst (headB {-@(a, a)-} bin)
-- knowing exactly what headB's signature is allows for polymorphic recursion

So: when something doesn't have a type signature, the type checker tries to assign it a type, and if it comes across a bogus constraint on its way, it rejects the program. When something has a type signature, the type checker descends into it to make sure it's correct (tries to prove it wrong, if you prefer to think of it that way).

Haskell’s type system is not Turing complete because there are heavy syntactic restrictions to prevent e.g. type lambdas (without language extensions), but it doesn’t suffice to make sure all programs run to completion without error because it still allows bottoms (not to mention all the unsafe functions). It provides the weaker guarantee that, if a program runs to completion without using an unsafe function, it will remain type-correct. Under GHC, with sufficient language extensions, the type system does become Turing complete. I don't think it allows ill-typed programs through; I think the most you can do is throw the compiler into an infinite loop.