Search code examples
haskellabstract-syntax-treerecursion-schemes

Factoring out recursion in a complex AST


For a side project I am working on I currently have to deal with an abstract syntax tree and transform it according to rules (the specifics are unimportant).

The AST itself is nontrivial, meaning it has subexpressions which are restricted to some types only. (e.g. the operator A must take an argument which is of type B only, not any Expr. A drastically simplified reduced version of my datatype looks like this:

data Expr  = List [Expr]
           | Strange Str
           | Literal Lit
data Str   = A Expr
           | B Expr
           | C Lit
           | D String
           | E [Expr]
data Lit   = Int Int
           | String String

My goal is to factor out the explicit recursion and rely on recursion schemes instead, as demonstrated in these two excellent blog posts, which provide very powerful general-purpose tools to operate on my AST. Applying the necessary factoring, we end up with:

data ExprF a = List [a]
             | Strange (StrF a)
             | Literal (LitF a)
data StrF a  = A a
             | B a
             | C (LitF a)
             | D String
             | E [a]
data LitF a  = Int Int
             | String String

If I didn't mess up, type Expr = Fix ExprF should now be isomorphic to the previously defined Expr.

However, writing cata for these cases becomes rather tedious, as I have to pattern match B a :: StrF a inside of an Str :: ExprF a for cata to be well-typed. For the entire original AST this is unfeasible.

I stumbled upon fixing GADTs, which seems to me like it is a solution to my problem, however the user-unfriendly interface of the duplicated higher-order type classes etc. is quite the unneccessary boilerplate.

So, to sum up my questions:

  1. Is rewriting the AST as a GADT the correct way to go about this?
  2. If yes, how could I transform the example into a well-working version? On a second note, is there better support for higher kinded Functors in GHC now?

Solution

  • If you've gone through the effort of to separate out the recursion in your data type, then you can just derive Functor and you're done. You don't need any fancy features to get the recursion scheme. (As a side note, there's no reason to parameterize the Lit data type.)

    The fold is:

    newtype Fix f = In { out :: f (Fix f) }
    
    gfold :: (Functor f) => (f a -> a) -> Fix f -> a
    gfold alg = alg . fmap (gfold alg) . out
    

    To specify the algebra (the alg parameter), you need to do a case analysis against ExprF, but the alternative would be to have the fold have a dozen or more parameters: one for each data constructor. That wouldn't really save you much typing and would be much harder to read. If you want (and this may require rank-2 types in general), you can package all those parameters up into a record and then you could use record update to update "pre-made" records that provide "default" behavior in various circumstances. There's an old paper Dealing with Large Bananas that takes an approach like this. What I'm suggesting, to be clear, is just wrapping the gfold function above with a function that takes a record, and passes in an algebra that will do the case analysis and call the appropriate field of the record for each case.

    Of course, you could use GHC Generics or the various "generic/polytypic" programming libraries like Scrap Your Boilerplate instead of this. You are basically recreating what they do.