haskell algebraic-data-types parametric-polymorphism

Recursive algebraic data types via polymorphism in Haskell

I am trying to understand the definition, de- and encoding of recursive algebraic data types given the functionality of universal polymorphism. As an example, I tried to implement the recursive type of binary trees via

data BTAlg x = Empty | Leaf x x
type BT = forall z. ((BTAlg z) -> z) -> z

the intuition being that the type of binary trees should be initial among all types T equipped with a constant e: T and a binary operation m: T -> T -> T, i.e. the "initial module" over the functor BTAlg. In other words, an element of BT is a way of assigning, for any such module T, an element of T.

The module structure on BT itself can be defined via

initial_module :: (BTAlg BT) -> BT
initial_module = \s -> case s of
  Empty -> (\f -> (f Empty))
  Leaf x y -> (\f -> (f (Leaf (x f) (y f))))

As a step towards pattern matching for BT, I now want to apply an element x:BT to the type BT itself, which I think of as some kind of encoding-decoding of x.

decode_encode :: BT -> BT
decode_encode x = x initial_module

However, this code results in a type error I cannot handle:

Couldn't match expected type `(BTAlg z -> z) -> z'
            with actual type `BT'
Expected type: BTAlg ((BTAlg z -> z) -> z) -> (BTAlg z -> z) -> z
  Actual type: BTAlg BT -> (BTAlg z0 -> z0) -> z0
In the first argument of `x', namely `initial_module'
In the expression: x initial_module

What's wrong here? I don't know why the type checker does not recognize that the universal type parameter z has to be specialized to BT in x in order for x to be applicable to initial_module; writing (x :: ((BTAlg BT) -> BT) -> BT) initial_module doesn't help either.

Addendum concerning the motivation for defining decode_encode: I want to convince myself that BT is in fact 'isomorphic' to the standard realization

data BTStd = StdEmpty | StdLeaf BTStd BTStd

of binary trees; while I don't know how to make this precise inside Haskell, a starter would be to construct maps BT -> BTStd and BTStd -> BT going back and forth between the two realizations.

The definition of toStandard: BT -> BTStd is an application of the universal property of BT to the canonical BTAlg module structure on BTStd:

std_module :: (BTAlg BTStd) -> BTStd
std_module s = case s of 
  Empty -> StdEmpty
  Leaf x y -> StdLeaf x y

toStandard: BT -> BTStd
toStandard x = x std_module

For the decoding function fromStandard: BTStd -> BT I would like to do the following:

fromStandard :: BTStd -> BT
fromStandard s = case s of 
  StdEmpty -> initial_module Empty
  StdLeaf x y -> initial_module (Leaf (fromStandard x) (fromStandard y))

However, this gives the same typing problems as the direct definition of decode_encode above:

Couldn't match expected type `BT'
            with actual type `(BTAlg z0 -> z0) -> z0'
In the return type of a call of `fromStandard'
Probable cause: `fromStandard' is applied to too few arguments
In the first argument of `Leaf', namely `(fromStandard x)'
In the first argument of `initial_module', namely
  `(Leaf (fromStandard x) (fromStandard y))'

Thank you!

Solution

If you look at the inferred type for decode_encode

:t decode_encode
> decode_encode :: ((BTAlg BT -> (BTAlg z -> z) -> z) -> t) -> t

it's clear that GHC has lost quite a bit of polymorphism. I'm not completely sure of the details here—this code requires ImpredicativeTypes to compile which is usually where my understanding starts to break down. However, there is a standard way for keeping polymorphism around: use a newtype

newtype BT = BT { runBT :: forall z. (BTAlg z -> z) -> z }

The newtype establishes an isomorphism BT ~ forall z . (BTAlg z -> z) -> z witnessed by BT and runBT. As long as we put those witnesses in the right spots we can go forward.

data    BTAlg x = Empty    | Leaf    x     x
data    BTStd   = StdEmpty | StdLeaf BTStd BTStd
newtype BT      = BT { runBT :: forall z. ((BTAlg z) -> z) -> z }

initial_module :: BTAlg BT -> BT
initial_module s = case s of
  Empty -> BT $ \f -> (f Empty)
  Leaf x y -> BT $ \f -> (f (Leaf (runBT x f) (runBT y f)))

std_module :: (BTAlg BTStd) -> BTStd
std_module s = case s of 
  Empty -> StdEmpty
  Leaf x y -> StdLeaf x y

toStandard :: BT -> BTStd
toStandard x = runBT x std_module

fromStandard :: BTStd -> BT
fromStandard s = case s of
  StdEmpty -> initial_module Empty
  StdLeaf x y -> initial_module (Leaf (fromStandard x) (fromStandard y))

In particular what's nice is that we use runBT to control when and how many times the type lambda in BT gets applied

decode_encode :: BT -> BT
decode_encode x = runBT x initial_module

One use of runBT corresponds to one unification of the quantified type.