I am trying to build a formatter for Idris with Lightyear. The whole program so far is here:
https://github.com/hejfelix/IdrisFMT/blob/501a4a9e8b1b4154ed0d7836676c24d98de8b76a/IdrisFmt.idr
For now, the purpose is to tokenize the file itself and then pretty print it, i.e., the file as input should be a fixpoint.
The problem comes after each string literal, where my parser seems to eat up whitespace. If I put anything else than whitespace immediately after a string literal, it will parse both that character as well as all the following whitespace.
This sample program will show the error:
main2 : IO ()
main2 = putStrLn $ str
where
str = case parse tokenParser "\"IdrisFMT.idr\" \n" of
(Left l) => "failed" ++ show l
(Right r) => show $ map (show @{default}) r
This prints out:
*IdrisFMT> :exec main2
["StringLiteral(\"IdrisFMT.idr\")"]
If I change the string I'm parsing to "\"IdrisFMT.idr\"c \n"
, I get:
*IdrisFMT> :exec main2
["StringLiteral(\"IdrisFMT.idr\")", "Identifier(c)", "' '", "'\\n'"]
which is what I expected.
I believe the error arises from the way I parse the string literals, but I am failing to understand my mistake, and I'm having trouble finding a good way to debug lightyear parsers. The implementation of my string literal parser is as follows:
escape : Parser String
escape = do
d <- char '\\'
c <- oneOf "\\\"0nrvtbf"
pure $ pack $ (the $ List Char) [d,c]
nonEscape : Parser String
nonEscape = map (\x => pack $ (the $ List _) [x]) $ noneOf "\\\"\0\n\r\v\t\b\f"
character : Parser String
character = nonEscape <|>| escape
stringLiteralToken : Parser Token
stringLiteralToken = map (StringLiteral . concat) $ dquote (many character)
How can I prevent my string literal parser from eating up whitespace after the literal?
After chatting on the #idris channel, I was helped to understand that most of the built-in higher order parsers (e.g.dquote
) skip whitespace at the end.
In my case, this was not what I wanted. Instead, I used the between
function which takes 3 parameters, a parser for when to start, another for when to stop, and a third for whatever comes in between.
To parse string literals, I am now doing this:
escape : Parser String
escape = do
d <- char '\\'
c <- oneOf "\\\"0nrvtbf'"
pure $ pack $ (the $ List Char) [d,c]
nonEscape : Parser String
nonEscape = map (\x => pack $ (the $ List _) [x]) $ noneOf "\\\"\0\n\r\v\t\b\f"
character : Parser String
character = nonEscape <|>| escape
stringLiteralToken : Parser Token
stringLiteralToken = map (StringLiteral . concat) $ (between (char '"') (char '"')) (many character)
This solved my problem.