How to define a working set of lexer and parser (exempli gratia: flex and bison) to support the C++0x styled raw string literals?
As you may already know, new string literals in C++0x can be expressed in a very flexible way.
R"<delim>...<delim>";
- in this code the <delim>
can be pretty much everything and also no escape characters are needed.
Any kind of parentheses can be used to delimit the end of string:
R"(I love those who yearn for the impossible. (Von Goethe, "Faust"))";
Blocks of text can be simply defined using equal occurrences of same characters:
R";***************************(
; TINY BASIC FOR INTEL 8080
; VERSION 2.0
; BY LI-CHEN WANG
; MODIFIED AND TRANSLATED
; TO INTEL MNEMONICS
; BY ROGER RAUSKOLB
; 10 OCTOBER, 1976
; @COPYLEFT
; ALL WRONGS RESERVED )
;***************************";
More information can be found here(wikipedia) and here(att).
I would like to use this fantastic feature in a language I am developing now.
So, how can I define a proper tokenizer and syntax analyzer to achive the result?
Thanks in advance for your answers!
You could proprocess literals in lexical analysis stage and transform them into something like meta token.
Input:
int a;
char *b = R"....";
Preprocessed:
int a;
char *b = R*literal[0]*;
Tokenized:
INT symbol[0] DELIM
CHAR OP_ASTR symbol[1] OP_EQ symbol[2] *literal[0]* DELIM
Symbol table contents { "a", "b", "R" }
Literal table contents { "...." }
literal[0] is the pointer to the original literal text.