Search code examples
haskellfunctional-programmingparsecparser-combinatorsmegaparsec

Recursing to a function that doesn't exist yet in Haskell


I'm stuck on a problem with writing a parser in Haskell that I hope someone can help out with!

It is a bit more complicated than my usual parser because there are two layers of parsing. First a language definition is parsed into an AST, then that AST is transformed into another parser that parses the actual language.

I have made pretty good progress so far but I'm stuck on implementing recursion in the language definition. As the language definition is transformed from AST into a parser in a recursive function, I can't work out how it can call itself if it doesn't exist yet.

I'm finding it a bit hard to explain my problem, so maybe an example will help.

The language definition might define that a language consists of three keywords in sequence and then optional recursion in brackets.

A B C ($RECURSE)

Which would be parsed into an AST like:

[Keyword "A", Keyword "B", Keyword "C", Optional (Many [Recurse])]

The Many is not really required for this example, but in my actual project, optional blocks can have multiple syntax elements in them so an Optional would contain a Many with n elements.

I would then want it to get transformed into a parser that parses strings like:

A B C
A B C (A B C)
A B C (A B C (A B C))

I've boiled down my project into the simplest possible example. You can see my TODO comment where I'm stuck trying to implement the recursion.

{-# LANGUAGE OverloadedStrings #-}

module Example
  ( runExample,
  )
where

import Control.Applicative hiding (many, some)
import Data.Text (Text)
import Data.Void
import System.IO as SIO
import Text.Megaparsec hiding (State)
import Text.Megaparsec.Char (space1, string')
import qualified Text.Megaparsec.Char.Lexer as L
import Text.Megaparsec.Debug
import Text.Pretty.Simple (pPrint)

-- Types

type Parser = Parsec Void Text

data SyntaxAst = Keyword Text | Recurse | Optional SyntaxAst | Many [SyntaxAst]

--  Megaparsec Base Parsers

-- Space consumer - used by other parsers to ignore whitespace
sc :: Parser ()
sc =
  L.space
    space1
    (L.skipLineComment "--")
    (L.skipBlockComment "/*" "*/")

-- Runs a parser, then consumes any left over space with sc
lexeme :: Parser a -> Parser a
lexeme = L.lexeme sc

-- Parses a string, then consumes any left over space with sc
symbol :: Text -> Parser Text
symbol = L.symbol sc

-- Parses something between parentheses
inParens :: Parser a -> Parser a
inParens =
  between
    (symbol "(")
    (symbol ")")

-- Transforms the AST into a parser
transformSyntaxExprToParser :: SyntaxAst -> Parser [Text]
transformSyntaxExprToParser (Many exprs) = dbg "Many" (createParser exprs)
transformSyntaxExprToParser (Keyword text) = dbg "Keyword" (pure <$> lexeme (string' text))
transformSyntaxExprToParser (Optional inner) = dbg "Optional" (option [] (try (inParens (transformSyntaxExprToParser inner))))
transformSyntaxExprToParser Recurse = dbg "Recurse" (pure ["TODO"]) -- TODO: How do I recurse here?
-- transformSyntaxExprToParser s Recurse = dbg "Recurse" (createParser s) -- Seems to work in the example, but in my actual application creates an infinite loop and freezes

-- Walks over the parser AST and convert it to a parser
createParser :: [SyntaxAst] -> Parser [Text]
createParser expressions =
  do
    foldr1 (liftA2 (<>)) (fmap transformSyntaxExprToParser expressions)

runExample :: IO ()
runExample = do
  -- To make the example simple, lets cut out the language definition parsing and just define
  -- it literally.
  let languageParser = createParser [Keyword "A", Keyword "B", Keyword "C", Optional (Many [Recurse])]
  let run p = runParser p "" "A B C (A B C (A B C))"
  let result = run languageParser
  case result of
    Left bundle -> SIO.putStrLn (errorBundlePretty bundle)
    Right xs -> pPrint xs

A few things I've tried:

  1. Pass the original AST up to the transformSyntaxExprToParser function and call createParser when the Recurse token is encountered. This didn't work due to infinite loops.
  2. Using mutable references like IORef/STRef to pass in a reference that is updated to reference the final parser once the transformation is finished. I couldn't work out how to thread the IO/ST monads into the parser transform function.
  3. State monads. I couldn't work out how to pass a reference through the state monad.

I hope that makes sense, let me know if I need to elaborate more. I can also push up my full project if it will help.

Thanks for reading!

Edit: I've made changes to my original example to demonstrate the infinite loop problem (integrating the excellent suggestions in the answer below) at https://pastebin.com/DN0JJ9BA


Solution

  • I believe you can use laziness here. Pass the final parser as a parameter to transformSyntaxExprToParser, and when you see a Recurse, return that parser.

    transformSyntaxExprToParser :: Parser [Text] -> SyntaxAst -> Parser [Text]
    transformSyntaxExprToParser self = go
      where
        go (Keyword text) = dbg "Keyword" (pure <$> lexeme (string' text))
        go (Optional inner) = dbg "Optional" (option [] (try (inParens (go inner))))
        go Recurse = dbg "Recurse" self
    
    createParser :: [SyntaxAst] -> Parser [Text]
    createParser expressions = parser
      where
        parser = foldr1 (liftA2 (<>))
          (fmap (transformSyntaxExprToParser parser) expressions)
    

    This ought to produce exactly the same kind of recursive parser as if you had written it directly. A Parser is ultimately just a data structure which you can construct using its instances of Monad, Applicative, Alternative, &c.

    Your idea of doing this with a mutable reference such as an IORef is essentially what’s happening under the hood anyway when constructing and evaluating a thunk.

    Your idea here was almost correct:

    Pass the original AST up to the transformSyntaxExprToParser function and call createParser when the Recurse token is encountered. This didn't work due to infinite loops.

    The problem is that you were constructing a new parser for every Recurse, from the same input, which contains a Recurse, thus constructing a new parser…and so on. What my code above does is just pass in the same parser.

    If you need to perform monadic side effects while constructing the parser, such as logging, then you can use a recursive do, for example, with some hypothetical MonadLog class for illustration:

    {-# Language RecursiveDo #-}
    
    transformSyntaxExprToParser :: (MonadLog m) => Parser [Text] -> SyntaxAst -> m (Parser [Text])
    transformSyntaxExprToParser self = go
      where
        go (Keyword text) = do
          logMessage "Got ‘Keyword’"
          pure $ dbg "Keyword" (pure <$> lexeme (string' text))
        go (Optional inner) = do
          logMessage "Got ‘Optional’"
          inner' <- go inner
          pure $ dbg "Optional" (option [] (try (inParens inner')))
        go Recurse = do
          logMessage "Got ‘Recurse’"
          pure $ dbg "Recurse" self
    
    createParser :: (MonadFix m, MonadLog m) => [SyntaxAst] -> m (Parser [Text])
    createParser expressions = do
      rec
        parser <- fmap (foldr1 (liftA2 (<>)))
          (traverse (transformSyntaxExprToParser parser) expressions)
      pure parser
    

    The rec block introduces a recursive binding which you may construct using side effects. In general, some care is required to ensure that recursive definitions like this are sufficiently lazy, that is, you don’t force the result sooner than intended, but here the recursion pattern is very simple, and you never examine the self parser, only treat it as a black box to hook up to other parsers.

    This method also makes it explicit what the scope of a Recurse is, and opens the possibility of introducing local recursive parsers, with a new call to transformSyntaxExprToParser with a new local self argument.