I'm writing a source-to-source transformation using parsec, So I have a LanguageDef
for my language and I build a TokenParser
for it using Text.Parsec.Token.makeTokenParser
:
myLanguage = LanguageDef { ...
commentStart = "/*"
, commentEnd = "*/"
...
}
-- defines 'stringLiteral', 'identifier', etc...
TokenParser {..} = makeTokenParser myLanguage
Unfortunately since I defined commentStart
and commentEnd
, each of the parser combinators in the TokenParser
is a lexeme parser implemented in terms of whiteSpace
, and whiteSpace
eats spaces as well as comments.
What is the right way to preserve comments in this situation?
Approaches I can think of:
commentStart
and commentEnd
. Wrap each of the lexeme parsers in another combinator that grabs comments before parsing each token.makeTokenParser
(or perhaps use some library that generalizes Text.Parsec.Token
; if so, which library?)What's the done thing in this situation?
In principle, defining commentStart and commentEnd don't fit with preserving comments, because you need to consider comments as valid parts of both source and target language, including them in your grammar and your AST/ADT.
In this way, you'd be able to keep the text of the comment as the payload data of a Comment constructor, and output it appropriately in the target language, something like
data Statement = Comment String | Return Expression | ......
The fact that neither source nor target language sees the comment text as relevant is irrelevant for your translation code.
Major problem with this approach: It doesn't really fit well with makeTokenParser
, and fits better with implementing your source language's parser from the ground up.
I guess I'm veering towards editing makeTokenParser
to just get the comment parsers to return the String
instead of ()
.