haskell functional-programming tree fold

Best way to name a tree (with children) structure with no duplicate names in Haskell

Let's say I have the following data structure.

data Tree = Tree
  { name        :: String
  , children    :: [Tree]
  , ...
  }

My goal is to be able to map over a list of trees and their children, so that I can name each tree uniquely, hence the Map structure in the following code which represents the number of times a particular name has been used, in other words, Map Name Count. So if I have function baseName :: SystemTree -> String which returns the unnumbered name based on the unlisted attributes, it can be combined with the number in the map, such that even if a baseName is reused no name is ever used twice.

nameSystemTrees :: Map String Int -> [Tree] -> (Map String Int, [Tree])
nameSystemTrees nameState trees =
  ...

My question is, what is the best way to approach this in Haskell? Is it possible to use Foldable here? I noticed there's the Data.Tree package but unfortunately I already have a lot of custom code to construct these trees so I think it would take some work to be able to use the constructors in that package.

Solution

Well, you can't use Foldable (or the related class Traversable) because these classes are for types of kind * -> *. That is, a Foldable instance can only be defined for a type like data Tree a = ... that is parameterized in another type a, but your data Tree = ... is unparameterized.

What you can do is write a function that traverses your tree applying a monadic action to each node, sort of a mapM tailored to your tree that maps a per-node action across the whole tree:

mapTreeM :: Monad m => (Tree -> m Tree) -> Tree -> m Tree
mapTreeM f = mtm  -- @f@ is the per-node action, @mtm@ the whole-tree action
  where mtm tree = do
          -- apply node action @f@ to root node
          tree' <- f tree
          -- recurse over children with @mtm@
          children' <- mapM mtm (children tree')
          -- update the children
          return $ tree' { children = children' }

Now, this can apply any monadic action, including a State-based monadic action that assigns a numbered suffix, with a separate counter for each name. This is, given:

data Tree = Tree
  { name :: String
  , children :: [Tree]
  } deriving (Show, Eq)

you can define the node-renamer:

uniquifyNode :: Tree -> State (Map String Int) Tree
uniquifyNode node = do
  let nm = name node
  -- get current count for this name
  n <- gets (Map.findWithDefault 1 nm)
  -- store an updated count
  modify (Map.insert nm (n+1))
  -- return uniquified name
  return (node { name = nm ++ show n })

and combine the two to create a tree-renamer:

uniquifyTree :: Tree -> Tree
uniquifyTree t = evalState (mapTreeM uniquifyNode t) Map.empty

and test it on a tree:

t0 :: Tree
t0 = Tree "a" [ Tree "a" []
              , Tree "b" [ Tree "a" []
                         , Tree "b" []
                         , Tree "c" []
             ]
      , Tree "c" [ Tree "a" [] ]
      ]

like so:

> uniquifyTree t0

which prints a tree equivalent to:

t1 :: Tree
t1 = Tree "a1" [ Tree "a2" []
               , Tree "b1" [ Tree "a3" []
                           , Tree "b2" []
                           , Tree "c1" []
                           ]
              , Tree "c2" [ Tree "a4" [] ]
              ]

Note that mapTreeM is essentially equivalent to your mapTree, and you can define mapTree in terms of mapTreeM using runState and state which don't actually do anything except wrap and unwrap data types:

mapTree :: ((a, Tree) -> (a, Tree)) -> (a, Tree) -> (a, Tree)
mapTree f (a, t) = let (t', a') = runState (mapTreeM g t) a in (a', t')
  where g t = state (\a -> let (a', t') = f (a, t) in (t', a'))

So, structurally, this isn't much different from what you've already done. You just reinvented the state monad (as (a, Tree) -> (a, Tree)) and wrote a sort of custom mapM to traverse the tree without making the monadic action general.

One thing about the explicit monadic version is that you can use it with some other monadic actions. Here are some examples:

> -- replace all names with "foo" (Identity action)
> import Data.Functor.Identity
> runIdentity $ mapTreeM (\(Tree n c) -> Identity (Tree "foo" c)) t0
> -- read the names from a file (IO action)
> import System.IO
> withFile "/usr/share/dict/words" ReadMode $ 
    \h -> mapTreeM (\(Tree n c) -> flip Tree c <$> hGetLine h) t0    
> -- get a list of names in order (Writer action)
> import Control.Monad.Writer
> execWriter $ mapTreeM (\t@(Tree n _) -> tell [n] >> return t) t0

Anyway, the full program is:

import Control.Monad.State
import Data.Map.Strict (Map)
import qualified Data.Map.Strict as Map

data Tree = Tree
  { name :: String
  , children :: [Tree]
  } deriving (Show, Eq)

mapTreeM :: Monad m => (Tree -> m Tree) -> Tree -> m Tree
mapTreeM f = mtm
  where mtm tree = do
          tree' <- f tree
          children' <- mapM mtm (children tree')
          return $ tree' { children = children' }

uniquifyNode :: Tree -> State (Map String Int) Tree
uniquifyNode node = do
  let nm = name node
  n <- gets (Map.findWithDefault 1 nm)
  modify (Map.insert nm (n+1))
  return (node { name = nm ++ show n })

uniquifyTree :: Tree -> Tree
uniquifyTree t = evalState (mapTreeM uniquifyNode t) Map.empty

t0 :: Tree
t0 = Tree "a" [ Tree "a" []
              , Tree "b" [ Tree "a" []
                         , Tree "b" []
                         , Tree "c" []
                         ]
              , Tree "c" [ Tree "a" [] ]
              ]

t1 :: Tree
t1 = Tree "a1" [ Tree "a2" []
               , Tree "b1" [ Tree "a3" []
                           , Tree "b2" []
                           , Tree "c1" []
                           ]
              , Tree "c2" [ Tree "a4" [] ]
              ]

main = print $ uniquifyTree t0 == t1