I am trying to write a program that will understand a language where embedded comments are allowed. Such as:
/* Here's a comment
/* This comment is further embedded */ second comment is closed
Must close first comment */
This should be recognized as a comment (and as such not stop at the first */ it sees unless it has only seen 1 comment opening prior).
This would be an easy issue to fix in C, I could simply have a counter that incremented when it saw comment opens and decrements when it sees a comment close. If the counter is at 0, we're in "code section".
However, without having state in Haskell, it's a little more challenging.
I've read up on monadUserState which supposedly allows to keep track of a state for this exact type of parsing. However, I can't find very much reading material on it aside from the tutorial page on alex.
When I try to compile it gives the error
templates\wrappers.hs:213:16: Not in scope: `alexEOF`
It should be noted that I directly changed from the "basic" wrapper to the "monadUserState" without changing my code (I don't know what to add in order to use it). It says that this must be initialized in the user code:
data AlexState = AlexState {
alex_pos :: !AlexPosn, -- position at current input location
alex_inp :: String, -- the current input
alex_chr :: !Char, -- the character before the input
alex_bytes :: [Byte], -- rest of the bytes for the current char
alex_scd :: !Int, -- the current startcode
alex_ust :: AlexUserState -- AlexUserState will be defined in the user program
}
I'm a bit of a lexxing noob and I'm not at all sure what I should be adding here to make it at least compile... then I can worry about the logic of the thing.
Update: Working example available here: http://lpaste.net/119212
The file "tiger.x" (link) in the alex github repo contains an example of how to track embedded comments using the monadUserState wrapper.
Well, unfortunately that example doesn't compile but the ideas there should work.
Basically, these lines perform embedded comment processing:
<0> "/*" { enterNewComment `andBegin` state_comment }
<state_comment> "/*" { embedComment }
<state_comment> "*/" { unembedComment }
<state_comment> . ;
<state_comment> \n { skip }
As for alexEOF
, the idea is to add an EOF token to your token data type:
data Tokens = ... | EOF
and define alexEOF
as:
alexEOF = return EOF
See the file tests/tokens_monadUserState_bytestring.x in the alex repo for an example of this.