Consider a language with fully optional semicolons almost entirely as sugar, i.e.:
;; foo; bar;;;;
is validfoo bar foobar
is validif (+1); foo
is different to if (+1) foo
in semantics, so ;
cannot be considered whitespaceHere is an example parser:
{-# LANGUAGE OverloadedStrings #-}
import Text.Trifecta
import Text.Trifecta.Delta
import Text.PrettyPrint.ANSI.Leijen (putDoc, (<>), linebreak)
import Control.Monad.Trans.State.Strict
import Control.Applicative
type TestParser a = StateT Int Parser a
data AST a = Foo a | Bar a deriving (Show)
pFoo :: TestParser (AST (Delta, Int))
pFoo = curry Foo <$ string "foo" <*> position <* modify (+1) <*> get
pBar :: TestParser (AST (Delta, Int))
pBar = curry Bar <$ string "bar" <*> position <*> get
pStmt :: TestParser (AST (Delta, Int))
pStmt = semi *> pStmt <|> pFoo <|> pBar <?> "statement"
pTest :: TestParser [AST (Delta, Int)]
pTest = some pStmt
main :: IO ()
main
= do let res = parseByteString (evalStateT pTest 0)
(Directed "(test)" 0 0 0 0) ";;foo;bar;\nfoo;; foobarbar;;"
case res of
Success ast
-> print ast
Failure errdoc
-> putDoc (errdoc <> linebreak)
The problem I am having with such a parser is that I need to be able to skip over semicolons without committing to parse a pStmt
. At the moment the following error occurs:
(test):2:18: error: unexpected
EOF, expected: statement
foo;; foobarbar;;<EOF>
This is because it expects a statement (in semi *> pStmt
), however because stacked semicolons can sugar both the beginning and end of expressions I can't be sure I really want to expect/parse one before I already expect one.
One hack I developed was to have Nop
as a constructor in my AST, but I really don't want to do that -- it feels like a hack and with the number of semicolons in some documents it would greatly increase memory usage.
I am looking for solutions/suggestions.
Attempt at EBNF form of the desired grammar:
expr = "foo" | "bar"
expr with sugar = expr | ";"
program = { [white space], expr with sugar, [white space] }
Ok, here it is:
pStmt = pFoo <|> pBar
pWhiteStmt = do
many whitespace
p <- pStmt
many whitespace
return p
pTest = do
many semi
pS <- sepEndBy pWhiteStm (some semi)
eof
return pS
And test it:
> parse pTest "" ";;foo;bar;\nfoo;; foo;bar;bar;;"
Right ["foo","bar","foo","foo","bar","bar"]
> parse pTest "" ";;foo;bar;\nfoo;; foobarbar;;"
Left (line 2, column 10):
unexpected 'b'
expecting ";" or end of input
If we wish to have a valid "; foobarbar;"
, then we need to change pWhiteStmt
parser to next:
pWhiteStmt = do
many whitespace
p <- some pStmt
many whitespace
return p
And check it:
> parse pTest "" ";;foo;bar;\nfoo;; foobarbar;;"
Right [["foo"],["bar"],["foo"],["foo","bar","bar"]]
And finally, if we still wish to have valid "; foo bar baz;"
then we also need to change pTest
function to next:
pTest = do
many semi
pS <- sepEndBy (some pWhiteStm) (some semi)
eof
return pS
And test it
> parse pTest "" ";;foo;bar;\nfoo;; foo bar bar;;"
Right [[["foo"]],[["bar"]],[["foo"]],[["foo"],["bar"],["bar"]]]
If we have many parentheses , it is need to replace return p
to return (concat p)
in the pWhiteStmt