Search code examples
haskellrecursionabstract-syntax-treecatamorphismrecursion-schemes

Is there something like cata but where you can match inner structure?


I have this language AST

data ExprF r = Const Int
              | Var   String
              | Lambda String r
              | EList [r]
              | Apply r r
 deriving ( Show, Eq, Ord, Functor, Foldable )

And I want to convert it to string

toString = cata $ \case
  Const x -> show x
  Var x -> x
  EList x -> unwords x
  Lambda x y -> unwords [x, "=>", y]
  Apply x y -> unwords [x, "(", y, ")"]

But when lambda is used in Apply I need the parentheses

(x => x)(1)

but I cannot match inner structure with cata

toString :: Fix ExprF -> String
toString = cata $ \case
  Const x -> show x
  Var x -> x
  Lambda x y -> unwords [x, "=>", y]
  Apply (Lambda{}) y -> unwords ["(", x, ")", "(", y, ")"]
  Apply x y -> unwords [x, "(", y, ")"]

Is there any better solution than para?

toString2 :: Fix ExprF -> String
toString2 = para $ \case
  Const x -> show x
  Var x -> x
  Lambda x (_,y) -> unwords [x, "=>", y]
  EList x -> unwords (snd <$> x)
  Apply ((Fix Lambda {}),x) (_,y) -> unwords ["(", x, ")", "(", y, ")"]
  Apply (_,x) (_,y) -> unwords [x, "(", y, ")"]

It looks uglier. Even it is needed only in one place I need to remove fst tuple parameters everywhere and I guess it will be slower.


Solution

  • As @chi, @DanielWagner and I pointed out in the comments, the way to do this sort of pretty-printing-with-parenthesisation in a structurally recursive manner is "the showsPrec approach".

    The big idea is not to fold up the syntax tree into a String, but into a function Bool -> String. This gives us a degree of context-sensitivity in the fold: we'll use that extra Bool parameter to keep track of whether we're currently in the context of the left-hand side of an application.

    parens x = "(" ++ x ++ ")"
    
    ppAlg :: ExprF (Bool -> String) -> (Bool -> String)
    ppAlg (Const x) isBeingApplied = show x
    ppAlg (Var x) isBeingApplied = x
    ppAlg (Lambda name body) isBeingApplied = p ("\\" ++ name ++ " -> " ++ body False)
        where p = if isBeingApplied then parens else id
    ppAlg (EList es) isBeingApplied = unwords (sequenceA es False)
    ppAlg (Apply fun arg) isBeingApplied = fun True ++ " " ++ arg False
    

    We pass values of isBeingApplied down the recursive calls depending on where we are in the syntax tree right now. Note that the only place we're passing down True is as an argument to fun in the body of the Apply case. Then, in the Lambda case, we inspect that argument. If the current term is the left-hand part of an application we parenthesise the lambda; if not we don't.

    At the top level, having folded up the whole tree into a function Bool -> String, we pass it an argument of False - we're not currently in the context of an application - to get a String out.

    pp :: Expr -> String
    pp ex = cata ppAlg ex False
    
    ghci> pp $ app (lam "x" (var "x")) (cnst 2)
    "(\\x -> x) 2"
    

    By replacing the Bool with an Int, this approach can be generalised to parenthesising operators with arbitrary precedences, as covered in @DanielWagner's linked answer.