Search code examples
haskellfoldalgebraic-data-types

How to use the fold function in Haskell with other datatypes


Is there an universal way of thinking on how to create a fold function for a new data type?

For example, the fold function for the data Tree is:

data Tree t = Leaf | Node t (Tree t) (Tree t)
              deriving (Eq,Ord,Show)

treeFold:: (a -> b -> b -> b) -> b -> Tree a -> b
treeFold f e Leaf = e
treeFold f e (Node x l r) = f x (treeFold f e l) (treeFold f e r)

For example, how would I have to create the fold function for the following data?

data Json a = Val a | Obj [(String, Json a)]

I know the type would have to contain 2 functions, one for each ot the cases Val and Obj. What do I have to consider while creating the fold? I hope my question makes sense. I've just came across many different datatypes where it was asked to write a fold function for a data type, and I don't seem to find the pattern.


Solution

  • As Willem Van Onsem pointed out in a (now-deleted) comment, what you are trying to implement is also called a catamorphism. I've written some about what I suppose you might call a beginner's view of catamorphisms, at Does each type have a unique catamorphism?. You can derive the catamorphism for a type (or show that none can exist) quite mechanically. If your type has N constructors, the fold function must take N+1 arguments: one value of your type, and one function for each constructor. Each such function takes one argument per field that its corresponding constructor has (or, if the constructor has no fields, it takes an ordinary value, which you can imagine as a 0-ary function), and returns a value of whatever type the catamorphism returns.

    It's complicated in words, so I'll copy the relevant code from the answer I linked above, as an exemplar:

    data X a b f = A Int b
                 | B
                 | C (f a) (X a b f)
                 | D a
    
    xCata :: (Int -> b -> r)
          -> r
          -> (f a -> r -> r)
          -> (a -> r)
          -> X a b f
          -> r
    xCata a b c d v = case v of
      A i x -> a i x
      B -> b
      C f x -> c f (xCata a b c d x)
      D x -> d x
    

    Observe that each of the functions (a, b, c, d) has one argument per field in the associated constructor. In most of the cases, you simply call the function with each of the constructor's fields...but what's up with the C case? Why don't we write c f x instead of c f (xCata a b c d x)? This is where the recursion happens: cata's job is to recursively traverse (fold) the entire tree represented by your ADT, turning each X a b f value into a result of type r. Happily, there's only one possible way to do that transformation: call xCata with the same set of functions you were passed to begin with.