Search code examples
haskellpattern-matchinggadt

The pattern with functions like `bool`, `either`, etc


I recently learned about GADTs and their notation:

E.g.

data Maybe a where
  Nothing :: Maybe a
  Just    :: a -> Maybe a

data Either a b where
  Left  :: a -> Either a b
  Right :: b -> Either a b

data Bool where
  False :: Bool
  True  :: Bool

Now I noticed a similiarity to functions like bool, and either, which is basically just like the GADT definition:

  1. taking every line as an argument
  2. replacing the actual type with the next letter of the alphabet
  3. and finally returning a function Type -> (the letter of step 2)

E.g.

maybe  :: b -> (a -> b) -> Maybe a -> b
either :: (a -> c) -> (b -> c) -> Either a b -> c
bool   :: a -> a -> Bool -> a

This also includes foldr, but I noticed that e.g. Tuple doesn't have such a function, though you could easily define it:

tuple :: (a -> b -> c) -> (a,b) -> c
tuple f (x,y) = f x y

What is this pattern? It seems to me these functions alleviate the need for pattern matching (because they give a general way for each case) and thus every function operating on the type can be defined in terms of this function.


Solution

  • First, the types you mention are not really GADTs, they are plain ADTs, since the return type of each constructor is always T a. A proper GADT would be something like

    data T a where
       K1 :: T Bool  -- not T a
    

    Anyway, the technique you mention is a well known method to encode algebraic data types into (polymorphic) functions. It goes under many names, like Church encoding, Boehm-Berarducci encoding, endcoding as a catamorphism, etc. Sometimes the Yoneda lemma is used to justify this approach, but there's no need to understand the category-theoretic machinery to understand the method.

    Basically, the idea is the following. All the ADTs can be generated by

    • product types (,) a b
    • sum types Either a b
    • arrow types a -> b
    • unit type ()
    • void type Void (rarely used in Haskell, but theoretically nice)
    • variables (if the type bing defined has parameters)
    • possibly, basic types (Integer, ...)
    • type level-recursion

    Type level recursion is used when some value constructor takes the type which is being defined as an argument. The classic example is Peano-style naturals:

    data Nat where
       O :: Nat
       S :: Nat -> Nat
         -- ^^^ recursive!
    

    Lists are also common:

    data List a where
       Nil :: List a
       Cons :: a -> List a -> List a
                 -- ^^^^^^ recursive!
    

    Types like Maybe a, pairs, etc. are non recursive.

    Note that each ADT, recursive or not, can be reduced to a single constructor with a sigle argument, by summing (Either) over all the constructors, and multiplying all the arguments. For instance, Nat is isomorphic to

    data Nat1 where
      O1 :: () -> Nat
      S1 :: Nat -> Nat
    

    which is isomorphic to

    data Nat2 where K2 :: (Either () Nat) -> Nat
    

    Lists become

    data List1 a where K1 :: (Either () (a, List a)) -> List a
    

    The steps above make use of the algebra of types, which makes the sum and products of types obey the same rules as high school algebra, while a -> b behaves like the exponential b^a.

    Hence, we can write any ADT in the form

    -- pseudo code
    data T where
       K :: F T -> T
    type F k = .....
    

    For instance

    type F_Nat k = Either () k      -- for T = Nat
    type F_List_a k = Either () (a, k) -- for T = List a
    

    (Note that the latter type function F depends on a, but it's not important right now.)

    Non recursive types will not use k:

    type F_Maybe_a k = Either () a     -- for T = Maybe a
    

    Note that constructor K above makes the type T isomorphic to F T (let's ignore the lifting / extra bottom introduced by it). Essentially, we have that

    Nat ~= F Nat = Either () Nat
    List a ~= F (List a) = Either () (a, List a)
    Maybe a ~= F (Maybe a) = Either () a
    

    We can even formalize this further by abstracting from F

    newtype Fix f = Fix { unFix :: f (Fix f) }
    

    By definition Fix F will now be isomorphic to F (Fix F). We could let

    type Nat = Fix F_Nat
    

    (In Haskell, we need a newtype wrapper around F_Nat, which I omit for clarity.)

    Finally, the general encoding, or catamorphism, is:

    cata :: (F k -> k) -> Fix F -> k
    

    This assumes that F is a functor.

    For Nat, we get

    cata :: (Either () k -> k) -> Nat -> k
    -- isomorphic to
    cata :: (() -> k, k -> k) -> Nat -> k
    -- isomorphic to
    cata :: (k, k -> k) -> Nat -> k
    -- isomorphic to
    cata :: k -> (k -> k) -> Nat -> k
    

    Note the "high school albegra" steps, where k^(1+k) = k^1 * k^k, hence Either () k -> k ~= (() -> k, k -> k).

    Note that we get two arguments, k and k->k which correspond to O and S. This is not a coincidence -- we summed over all the constructors. This means that cata expects to be passed the value of type k which "plays the role of O" there, and then the value of type k -> k which plays the role of S.

    More informally, cata is telling us that, if we want to map a natural in k, we only have to state what is the "zero" inside k and how to take the "successor" in k, and then every Nat can be mapped consequently.

    For lists we get:

    cata :: (Either () (a, k) -> k) -> List a -> k
    -- isomorphic to
    cata :: (() -> k, (a, k) -> k) -> List a -> k
    -- isomorphic to
    cata :: (k, a -> k -> k) -> List a -> k
    -- isomorphic to
    cata :: k -> (a -> k -> k) -> List a -> k
    

    which is foldr.

    Again, this is cata telling us that, if we state how to take the "empty list" in k and to "cons" a and k inside k, we can map any list in k.

    Maybe a is the same:

    cata :: (Either () a -> k) -> Maybe a -> k
    -- isomorphic to
    cata :: (() -> k, a -> k) -> Maybe a -> k
    -- isomorphic to
    cata :: (k, a -> k) -> Maybe a -> k
    -- isomorphic to
    cata :: k -> (a -> k) -> Maybe a -> k
    

    If we can map Nothing in k, and perform Just mapping a in k, the we can map any Maybe a in k.

    If we try to apply the same approach to Bool and (a,b) we reach the functions which were posted in the questions.

    More advanced theoretical topics to look up:

    • (initial) F-algebras in category theory
    • eliminators / recursors / induction principles in type theory (these can be applied to GADTs as well)