Search code examples
haskellfunctional-programmingtreefold

Best way to name a tree (with children) structure with no duplicate names in Haskell


Let's say I have the following data structure.

data Tree = Tree
  { name        :: String
  , children    :: [Tree]
  , ...
  }

My goal is to be able to map over a list of trees and their children, so that I can name each tree uniquely, hence the Map structure in the following code which represents the number of times a particular name has been used, in other words, Map Name Count. So if I have function baseName :: SystemTree -> String which returns the unnumbered name based on the unlisted attributes, it can be combined with the number in the map, such that even if a baseName is reused no name is ever used twice.

nameSystemTrees :: Map String Int -> [Tree] -> (Map String Int, [Tree])
nameSystemTrees nameState trees =
  ...

My question is, what is the best way to approach this in Haskell? Is it possible to use Foldable here? I noticed there's the Data.Tree package but unfortunately I already have a lot of custom code to construct these trees so I think it would take some work to be able to use the constructors in that package.


Solution

  • Well, you can't use Foldable (or the related class Traversable) because these classes are for types of kind * -> *. That is, a Foldable instance can only be defined for a type like data Tree a = ... that is parameterized in another type a, but your data Tree = ... is unparameterized.

    What you can do is write a function that traverses your tree applying a monadic action to each node, sort of a mapM tailored to your tree that maps a per-node action across the whole tree:

    mapTreeM :: Monad m => (Tree -> m Tree) -> Tree -> m Tree
    mapTreeM f = mtm  -- @f@ is the per-node action, @mtm@ the whole-tree action
      where mtm tree = do
              -- apply node action @f@ to root node
              tree' <- f tree
              -- recurse over children with @mtm@
              children' <- mapM mtm (children tree')
              -- update the children
              return $ tree' { children = children' }
    

    Now, this can apply any monadic action, including a State-based monadic action that assigns a numbered suffix, with a separate counter for each name. This is, given:

    data Tree = Tree
      { name :: String
      , children :: [Tree]
      } deriving (Show, Eq)
    

    you can define the node-renamer:

    uniquifyNode :: Tree -> State (Map String Int) Tree
    uniquifyNode node = do
      let nm = name node
      -- get current count for this name
      n <- gets (Map.findWithDefault 1 nm)
      -- store an updated count
      modify (Map.insert nm (n+1))
      -- return uniquified name
      return (node { name = nm ++ show n })
    

    and combine the two to create a tree-renamer:

    uniquifyTree :: Tree -> Tree
    uniquifyTree t = evalState (mapTreeM uniquifyNode t) Map.empty
    

    and test it on a tree:

    t0 :: Tree
    t0 = Tree "a" [ Tree "a" []
                  , Tree "b" [ Tree "a" []
                             , Tree "b" []
                             , Tree "c" []
                 ]
          , Tree "c" [ Tree "a" [] ]
          ]
    

    like so:

    > uniquifyTree t0
    

    which prints a tree equivalent to:

    t1 :: Tree
    t1 = Tree "a1" [ Tree "a2" []
                   , Tree "b1" [ Tree "a3" []
                               , Tree "b2" []
                               , Tree "c1" []
                               ]
                  , Tree "c2" [ Tree "a4" [] ]
                  ]
    

    Note that mapTreeM is essentially equivalent to your mapTree, and you can define mapTree in terms of mapTreeM using runState and state which don't actually do anything except wrap and unwrap data types:

    mapTree :: ((a, Tree) -> (a, Tree)) -> (a, Tree) -> (a, Tree)
    mapTree f (a, t) = let (t', a') = runState (mapTreeM g t) a in (a', t')
      where g t = state (\a -> let (a', t') = f (a, t) in (t', a'))
    

    So, structurally, this isn't much different from what you've already done. You just reinvented the state monad (as (a, Tree) -> (a, Tree)) and wrote a sort of custom mapM to traverse the tree without making the monadic action general.

    One thing about the explicit monadic version is that you can use it with some other monadic actions. Here are some examples:

    > -- replace all names with "foo" (Identity action)
    > import Data.Functor.Identity
    > runIdentity $ mapTreeM (\(Tree n c) -> Identity (Tree "foo" c)) t0
    > -- read the names from a file (IO action)
    > import System.IO
    > withFile "/usr/share/dict/words" ReadMode $ 
        \h -> mapTreeM (\(Tree n c) -> flip Tree c <$> hGetLine h) t0    
    > -- get a list of names in order (Writer action)
    > import Control.Monad.Writer
    > execWriter $ mapTreeM (\t@(Tree n _) -> tell [n] >> return t) t0
    

    Anyway, the full program is:

    import Control.Monad.State
    import Data.Map.Strict (Map)
    import qualified Data.Map.Strict as Map
    
    data Tree = Tree
      { name :: String
      , children :: [Tree]
      } deriving (Show, Eq)
    
    mapTreeM :: Monad m => (Tree -> m Tree) -> Tree -> m Tree
    mapTreeM f = mtm
      where mtm tree = do
              tree' <- f tree
              children' <- mapM mtm (children tree')
              return $ tree' { children = children' }
    
    uniquifyNode :: Tree -> State (Map String Int) Tree
    uniquifyNode node = do
      let nm = name node
      n <- gets (Map.findWithDefault 1 nm)
      modify (Map.insert nm (n+1))
      return (node { name = nm ++ show n })
    
    uniquifyTree :: Tree -> Tree
    uniquifyTree t = evalState (mapTreeM uniquifyNode t) Map.empty
    
    t0 :: Tree
    t0 = Tree "a" [ Tree "a" []
                  , Tree "b" [ Tree "a" []
                             , Tree "b" []
                             , Tree "c" []
                             ]
                  , Tree "c" [ Tree "a" [] ]
                  ]
    
    t1 :: Tree
    t1 = Tree "a1" [ Tree "a2" []
                   , Tree "b1" [ Tree "a3" []
                               , Tree "b2" []
                               , Tree "c1" []
                               ]
                  , Tree "c2" [ Tree "a4" [] ]
                  ]
    
    main = print $ uniquifyTree t0 == t1