Search code examples
haskellclosureslispelmself-reference

Implement a self-reference/pointer in a pure/functional language (Elm/Haskell)


Abstract Problem:

I'd like to implement a self-reference / pointer in Elm.

Specific Problem:

I'm writing a toy LISP interpreter in Elm inspired by mal.

I'm attempting to implement something like letrec to support recursive and mutually-recursive bindings (the "self reference" and "pointers" I'm mentioning above).

Here's some example code:

(letrec
  ([count (lambda (items)
            (if (empty? items)
                0
                (+ 1 (count (cdr items)))
            )
          )
  ])
  (count (quote 1 2 3))
)
;=>3

Note how the body of the lambda refers to the binding count. In other words, the function needs a reference to itself.

Deeper Background:

When a lambda is defined, we need to create a function closure which consists of three components:

  1. The function body (the expression to be evaluated when the function is called).
  2. A list of function arguments (local variables that will be bound upon calling).
  3. A closure (the values of all non-local variables that may be referenced within the body of the function).

From the wikipedia article:

Closures are typically implemented with [...] a representation of the function's lexical environment (i.e., the set of available variables) at the time when the closure was created. The referencing environment binds the non-local names to the corresponding variables in the lexical environment at the time the closure is created, additionally extending their lifetime to at least as long as the lifetime of the closure itself. When the closure is entered at a later time, possibly with a different lexical environment, the function is executed with its non-local variables referring to the ones captured by the closure, not the current environment.

Based on the above lisp code, in creating the lambda, we create a closure whose count variable must be bound to the lambda, thereby creating an infinite/circular/self-reference. This problem gets further complicated by mutually-recursive definitions which must be supported by letrec as well.

Elm, being a pure functional language, does not support imperative modification of state. Therefore, I believe that it is impossible to represent self-referencing values in Elm. Can you provide some guidance on alternatives to implementing letrec in Elm?

Research and Attempts

Mal in Elm

Jos von Bakel has already implemented mal in Elm. See his notes here and the environment implementation here. He's gone to great lengths to manually build a pointer system with its own internal GC mechanism. While this works, this seems like massive amounts of struggle. I'm craving a pure functional implementation.

Mal in Haskell

The mal implementation in Haskell (see code here) uses Data.IORef to emulate pointers. This also seems like hack to me.

Y-Combinator/Fixed Points

It seems possible that the Y-Combinator can be used to implement these self references. There seems to be a Y* Combinator that works for mutually recursive functions as well. It seems logical to me that there must also exist a Z* combinator (equivalent to Y* but supports the eager evaluation model of Elm). Should I transform all of my letrec instances so that each binding is wrapped around a Z*?

The Y-Combinator is new to me and my intuitive mind simply does not understand it so I'm not sure if the above solution will work.

Conclusion

Thank you very much for reading! I have been unable to sleep well for days as I struggle with this problem.

Thank You!

-Advait


Solution

  • In Haskell, this is fairly straightforward thanks to lazy evaluation. Because Elm is strict, to use the technique below, you would need to introduce laziness explicitly, which would be more or less equivalent to adding a pointer indirection layer of the sort you mentioned in your question.

    Anyway, the Haskell answer might be useful to someone, so here goes...

    Fundamentally, a self-referencing Haskell value is easily constructed by introducing a recursive binding, such as:

    let mylist = [1,2] ++ mylist in mylist
    

    The same principle can be used in writing an interpreter to construct self-referencing values.

    Given the following simple S-expression language for constructing potentially recursive / self-referencing data structures with integer atoms:

    data Expr = Atom Int | Var String | Cons Expr Expr | LetRec [String] [Expr] Expr
    

    we can write an interpreter to evaluate it to the following type, which doesn't use IORefs or ad hoc pointers or anything weird like that:

    data Value = AtomV Int | ConsV Value Value deriving (Show)
    

    One such interpreter is:

    type Context = [(String,Value)]
    
    interp :: Context -> Expr -> Value
    interp _ (Atom x) = AtomV x
    interp ctx (Var v) = fromJust (lookup v ctx)
    interp ctx (Cons ca cd) = ConsV (interp ctx ca) (interp ctx cd)
    interp ctx (LetRec vs es e)
      = let ctx' = zip vs (map (interp ctx') es) ++ ctx
        in  interp ctx' e
    

    This is effectively a computation in a reader monad, but I've written it explicitly because a Reader version would require using the MonadFix instance either explicitly or via the RecursiveDo syntax and so would obscure the details.

    The key bit of code is the case for LetRec. Note that a new context is constructed by introducing a set of potentially mutually recursive bindings. Because evaluation is lazy, the values themselves can be computed with the expression interp ctx' es using the newly created ctx' of which they are part, tying the recursive knot.

    We can use our interpreter to create a self-referencing value like so:

    car :: Value -> Value
    car (ConsV ca _cd) = ca
    
    cdr :: Value -> Value
    cdr (ConsV _ca cd) = cd
    
    main = do
      let v = interp [] $ LetRec ["ones"] [Cons (Atom 1) (Var "ones")] (Var "ones")
    
      print $ car $ v
      print $ car . cdr $ v
      print $ car . cdr . cdr $ v
      print $ car . cdr . cdr . cdr . cdr . cdr . cdr . cdr . cdr . cdr . cdr $ v
    

    Here's the full code, also showing an alternative interp' using the Reader monad with recursive-do notation:

    {-# LANGUAGE RecursiveDo #-}
    {-# OPTIONS_GHC -Wall #-}
    
    module SelfRef where
    
    import Control.Monad.Reader
    import Data.Maybe
    
    data Expr = Atom Int | Var String | Cons Expr Expr | LetRec [String] [Expr] Expr
    data Value = AtomV Int | ConsV Value Value deriving (Show)
    
    type Context = [(String,Value)]
    
    interp :: Context -> Expr -> Value
    interp _ (Atom x) = AtomV x
    interp ctx (Var v) = fromJust (lookup v ctx)
    interp ctx (Cons ca cd) = ConsV (interp ctx ca) (interp ctx cd)
    interp ctx (LetRec vs es e)
      = let ctx' = zip vs (map (interp ctx') es) ++ ctx
        in  interp ctx' e
    
    interp' :: Expr -> Reader Context Value
    interp' (Atom x) = pure $ AtomV x
    interp' (Var v) = asks (fromJust . lookup v)
    interp' (Cons ca cd) = ConsV <$> interp' ca <*> interp' cd
    interp' (LetRec vs es e)
      = mdo let go = local (zip vs vals ++)
            vals <- go $ traverse interp' es
            go $ interp' e
    
    car :: Value -> Value
    car (ConsV ca _cd) = ca
    
    cdr :: Value -> Value
    cdr (ConsV _ca cd) = cd
    
    main = do
      let u = interp [] $ LetRec ["ones"] [Cons (Atom 1) (Var "ones")] (Var "ones")
      let v = runReader (interp' $ LetRec ["ones"] [Cons (Atom 1) (Var "ones")] (Var "ones")) []
    
      print $ car . cdr . cdr . cdr . cdr . cdr . cdr . cdr . cdr . cdr . cdr $ u
      print $ car . cdr . cdr . cdr . cdr . cdr . cdr . cdr . cdr . cdr . cdr $ v