Search code examples
haskelltypeclass

Is there a typeclass that changes the shape of a hierarchy?


I'm working with some hierarchical data, a contrived example here:

data Item a =
    Leaf a
    | ListItems [Item a]
    | DoubleItem (Item a) (Item a)
    | TripleItem (Item a) (Item a) (Item a)

I find myself often writing code like this:

removeDoubles :: Item a -> Item a
removeDoubles x@(Leaf _) = x
removeDoubles (ListItems xs) = ListItems $ removeDoubles <$> xs
removeDoubles (DoubleItem x y) = ListItems [removeDoubles x, removeDoubles y]
removeDoubles (TripleItem x y z) = TripleItem (removeDoubles x) (removeDoubles y) (removeDoubles z)

removeTriples :: Item a -> Item a
removeTriples x@(Leaf _) = x
removeTriples (ListItems xs) = ListItems $ removeTriples <$> xs
removeTriples (DoubleItem x y) = DoubleItem (removeTriples x) (removeTriples y)
removeTriples (TripleItem x y z) = ListItems [removeDoubles x, removeDoubles y, removeDoubles z]

There's obviously some overlap here between these two functions, and this approach is also not very robust if I start adding more constructors to Item. Is there some typeclass I should be using that would help reduce this repetition? I'm imaginging something like:

removeDoubles :: Item a -> Item a
removeDoubles (DoubleItem x y) = ListItems [removeDoubles x, removeDoubles y]
removeDoubles x = traverseHierarchy removeDoubles x

Functor, Traversable etcetera seem to be designed around changing the type of a, whereas I'm less interested in mapping a -> something and more interested in changing the shape of this hierarchy.

Thanks for any assistance!


Solution

  • I think you are looking for a catamorphism. With more formality and library tooling, the idea below can be generalized and also automated (more on that at the end of this answer). For your specific task, the code that could be automatically generated is similar to

    data Item a =
      Leaf a
      | ListItems [Item a]
      | DoubleItem (Item a) (Item a)
      | TripleItem (Item a) (Item a) (Item a)
    
    cata :: (a -> b) -- Leaf
         -> ([b] -> b) -- ListItems
         -> (b -> b -> b) -- DoubleItem
         -> (b -> b -> b -> b) -- TripleItem
         -> Item a
         -> b
    cata leaf list double triple = go
      where go (Leaf x) = leaf x
            go (ListItems xs) = list (map go xs)
            go (DoubleItem x y) = double (go x) (go y)
            go (TripleItem x y z) = triple (go x) (go y) (go z)
    

    Here cata is generalizing the idea of recursively folding over the structure, and has nothing to do specifically with modifying the hierarchy, for example by removing doubles. But you can use this general machinery to achieve both of your specific goals easily, and without repetition or explicit recursion:

    removeDoubles = cata Leaf ListItems lift TripleItem
      where lift x y = ListItems [x, y]
    
    removeTriples = cata Leaf ListItems DoubleItem lift
      where lift x y z = ListItems [x, y, z]
    

    cata is part of a larger family of concepts called "recursion schemes". I'm sure there are numerous articles attempting to explain these concepts, but An Introduction to Recursion Schemes is one I liked. The Haskell library implementing these concepts, including automatic generation of the cata function for your types, is recursion-schemes. (Really it doesn't generate cata; it generates typeclass instances for your type that allow you to use its predefined cata function, and many more)


    To use the cata implementation in recursion-schemes instead of a bespoke implementation for your type, we need several GHC extensions:

    {-# LANGUAGE DeriveFunctor, DeriveFoldable, DeriveTraversable -#}
    {-# LANGUAGE TemplateHaskell, TypeFamilies #-}
    
    import Data.Functor.Foldable.TH (makeBaseFunctor)
    import Data.Functor.Foldable (cata, embed)
    
    data Item a =
      Leaf a
      | ListItems [Item a]
      | DoubleItem (Item a) (Item a)
      | TripleItem (Item a) (Item a) (Item a)
    
    makeBaseFunctor ''Item
    

    makeBaseFunctor uses Template Haskell to define a new type ItemF, based on Item but more general, called its "base functor". Its constructors are similar, but have names suffixed with F, and the fields that (in Item) recursively contain an Item have a more general type, whose details I won't go into.

    Finally we're ready to implement your removeDoubles function with the minimum of detail: only describe the broad idea and let recursion-schemes do the rest of the work. Note that this is even less duplication than you hoped for in your question: the recursive calls to removeDoubles are all made implicit, and we can just specify what to do with a single level of the hierarchy.

    removeDoubles :: Item a -> Item a
    removeDoubles = cata go
      where go (DoubleItemF x y) = ListItems [x, y]
            go x = embed x
    

    This version of cata, instead of taking four separate functions for the four constructors of Item, takes one function, and passes that function an ItemF argument, which can be pattern-matched similarly to an Item.

    We match the DoubleItemF first, and do the "special" work there. Then in the catch-all clause afterwards, we simply use embed to convert the ItemF argument back to an Item without further modification. In both clauses, the x and y arguments we received have already been recursively handled by cata before we see them, so there's no need to do anything to their internals.