The syntax for algebraic data types is very similar to the syntax of Backus–Naur Form, which is used to describe context-free grammars. That got me thinking, if we think of the Haskell type checker as a parser for a language, represented as an algebraic data type (nularry type constructors representing the terminal symbols, for example), is the set of all languages accepted the same as the set of context free languages? Also, with this interpretation, what set of formal languages can GADTs accept?
First of all, data types do not always describe a set of strings (i.e., a language). That is, while a list type does, a tree type does not. One might counter that we could "flatten" the trees into lists and think of that as their language. Yet, what about data types like
data F = F Int (Int -> Int)
or, worse
data R = R (R -> Int)
?
Polynomial types (types without ->
inside) roughly describe trees, which can be flattened (in-order visited), so let's use those as an example.
As you have observed, writing a CFG as a (polynomial) type is easy, since you can exploit recursion
data A = A1 Int A | A2 Int B
data B = B1 Int B Char | B2
above A
expresses { Int^m Char^n | m>n }
.
GADTs go much beyond context-free languages.
data Z
data S n
data ListN a n where
L1 :: ListN a Z
L2 :: a -> ListN a n -> ListN a (S n)
data A
data B
data C
data ABC where
ABC :: ListN A n -> ListN B n -> ListN C n -> ABC
above ABC
expresses the (flattened) language A^n B^n C^n
, which is not context-free.
You are pretty much unrestricted with GADTs, since it's easy to encode arithmetics with them.
That is you can build a type Plus a b c
which is non-empty iff c=a+b
with Peano
naturals. You can also build a type Halt n m
which is non-empty iff the Turing machine m
halts on input m
. So, you can build a language
{ A^n B^m proof | n halts on m , and proof proves it }
which is recursive (and not in any simpler class, roughly).
At the moment, I do not know whether you can describe recursively enumerable (computably enumerable) languages in GADTs. Even in the halting problem example, I have to include the "proof" term inside the GADT to make it work.
Intuitively, if you have a string of length n
and you want to check it a against a GADT, you can
build all the GADT terms of depth n
, flatten them, and then compare to the string. This should
prove that such language is always recursive. However, existential types make this tree building
approach quite tricky, so I do not have a definite answer right now.