Define this data type:
data NaturalNumber = Zero | S NaturalNumber
deriving (Show)
In Haskell (compiled using GHC), this code will run without warning or error:
infinity = S infinity
inf1 = S inf2
inf2 = S inf1
So both recursive and mutually recursive infinitely deep values pass the type check.
However, the following code gives an error:
j = S 'h'
The error states Couldn't match expected type ‘NaturalNumber’ with actual type 'Char'
. The (same) error persists even if I set
j = S (S (S (S ... (S 'h')...)))
with a hundred or so nested S
's.
How can Haskell tell that infinity
is a valid member of NaturalNumber
but j
is not?
Interestingly, it also allows:
bottom = bottom
k = S bottom
Does Haskell merely try to prove incorrectness of a program, and if it fails to do so then allows it? Or is Haskell's type system not Turing complete, so if it allows the program then the program is provably (at the type level) correct?
(If the type system (in the formal semantics of Haskell, instead of only the type checker) is Turing complete, then it will either fail to realize some correctly typed programs are correct or some incorrectly typed programs are incorrect, due to the undecidability of the halting problem.)
Well
S :: NaturalNumber -> NaturalNumber
In
infinity = S infinity
We start by assuming nothing: we assign infinity
some unsolved type _a
and try to figure out what it is. We know that we have applied S
to infinity
, so _a
must be whatever is on the left side of the arrow in the constructor’s type, which is NaturalNumber
. We know that infinity
is the result of an application of S
, so infinity :: NaturalNumber
, again (if we got two conflicting types for this definition, we’d have to emit a type error).
Similar reasoning holds for the mutually recursive definitions. inf1
must be a NaturalNumber
because it appears as an argument to S
in inf2
; inf2
must be a NaturalNumber
because it is the result of S
; etc.
The general algorithm is to assign definitions unknown types (notable exceptions are literals and constructors), and then to create constraints on those types by seeing how every definition is used. E.g. this must be some form of list because it’s being reverse
d, and this must be an Int
because it’s being used to look up a value from an IntMap
, etc.
In the case of
oops = S 'a'
'a' :: Char
as it’s a literal, but, also, we must have 'a' :: NaturalNumber
because it’s used as an argument to S
. We get the obviously bogus constraint that the type of the literal must both be Char
and NaturalNumber
, which causes a type error.
And in
bottom = bottom
We start with bottom :: _a
. The only constraint is _a ~ _a
, because a value of type _a
(bottom
) is being used where a value of type _a
is expected (on the RHS of the definition of bottom
). Since there is nothing to further constrain the type, the unsolved type variable is generalized: it gets bound by a universal quantifier to produce bottom :: forall a. a
.
Note how both uses of bottom
above have the same type (_a
) while inferring the type of bottom
. This breaks polymorphic recursion: every occurrence of a value within its definition is taken to be of the same type as the definition itself. E.g.
-- perfectly balanced binary trees
data Binary a = Leaf a | Branch (Binary (a, a))
-- headB :: _a -> _r
headB (Leaf x) = x -- _a ~ Binary _r; headB :: Binary _r -> _r
headB (Branch bin) = fst (headB bin)
-- recursive call has type headB :: Binary _r -> _r
-- but bin :: Binary (_r, _r); mismatch
So you need a type signature:
headB :: {-forall a.-} Binary a -> a
headB (Leaf x) = x
headB (Branch bin) = fst (headB {-@(a, a)-} bin)
-- knowing exactly what headB's signature is allows for polymorphic recursion
So: when something doesn't have a type signature, the type checker tries to assign it a type, and if it comes across a bogus constraint on its way, it rejects the program. When something has a type signature, the type checker descends into it to make sure it's correct (tries to prove it wrong, if you prefer to think of it that way).
Haskell’s type system is not Turing complete because there are heavy syntactic restrictions to prevent e.g. type lambdas (without language extensions), but it doesn’t suffice to make sure all programs run to completion without error because it still allows bottoms (not to mention all the unsafe functions). It provides the weaker guarantee that, if a program runs to completion without using an unsafe function, it will remain type-correct. Under GHC, with sufficient language extensions, the type system does become Turing complete. I don't think it allows ill-typed programs through; I think the most you can do is throw the compiler into an infinite loop.