Ok... i have the following problem:
i need to parse (or tokenize) the following text
ASK "Hey dude, what's about \";\"" + "?";
ASK "How old are you?" INTO inAge;
ASK "This is a
multiline String with \";\";" + " can you parse this?"; ANSWER "Sure, i can!";
in lexer, i tried it with modes:
ASK : 'ASK' -> pushMode(UNTILSEMI) ;
ANSWER : 'ANSWER' -> pushMode(UNTILSEMI) ;
mode UNTILSEMI;
ENDSEMI : ';'+ -> popMode ;
CONTENT : ~[;]+ ;
the parser will be:
askStmt: ASK CONTENT ENDSEMI;
answerStmt: ASNWER CONTENT ENDSEMI;
my Problem: when there a semicolons inside of "strings", the tokenizer stops and the parser wont work..
i have no idea how to start. should i manipulate the lexer directly? can i do this with lexer-modes?
I don't see the need for lexical modes. Something like this would handle your example input correctly:
parse
: ( question | answer )* EOF
;
question
: ASK expression ( INTO ID )? SEMI
;
answer
: ANSWER expression SEMI
;
expression
: expression PLUS expression
| STRING
| ID
;
ASK : 'ASK';
ANSWER : 'ANSWER';
INTO : 'INTO';
ID : [a-zA-Z]+;
PLUS : '+';
SEMI : ';';
SPACES : [ \t\r\n]+ -> skip;
STRING : '"' ( ~[\\"] | '\\' . )* '"';
Even without expressions, so only a few tokens, I don't see the need for lexical modes:
parse
: ( question | answer )* EOF
;
question
: ASK ~SEMI* SEMI OTHER*
;
answer
: ANSWER ~SEMI* SEMI OTHER*
;
ASK : 'ASK';
ANSWER : 'ANSWER';
SEMI : ';';
STRING : '"' ( ~[\\"] | '\\' . )* '"';
OTHER : ~[";];
which will parse your example input as follows: