Search code examples
bisondslcontext-free-grammarjison

How do I define a String in JISON


I am just getting into writing a DSL and would like to use JISON (http://zaach.github.io/jison). I am trying to learn the grammar syntax and am running into a problem with specifying a string of characters in double quotes.

What I would think would work is:

%lex
%%

[\n\s]+                 /* skip whitespace */
"true"|"false"          return 'BOOL'
"IF"                    return 'START'
"AND"|"OR"              return 'LOGIC'
<<EOF>>                 return 'EOF'
.                       return 'INVALID'

/lex

%start string
%%

string
    : '"' [^"]+ '"'
        {$$ = $2;}
    ;

... or perhaps:

%lex
%%

[\n\s]+                 /* skip whitespace */
"true"|"false"          return 'BOOL'
"IF"                    return 'START'
"AND"|"OR"              return 'LOGIC'
\"[^"]+\"               return 'STRING'
<<EOF>>                 return 'EOF'
.                       return 'INVALID'

/lex

%start string
%%

string
    : STRING
        {$$ = $1;}
    ;

This first (basically) doesn't work at all, while the second one kinda works; when it finds a string the value coming out includes the escaped double-quotes.

Is there a good resource that helps with learning JISON/BISON/BNF grammar definitions? I have been looking around but haven't been able to find anything that helps me; not a comp/sci major. Am I just missing something simple or something more substantial?

For some context:

I am trying to define a simple DSL for parsing simple conditions:

IF Something > 100
AND Another == true
    doAction 2.51

Solution

  • You probably just need to trim the quotes:

    \"[^"]+\"         yytext = yytext.slice(1,-1); return 'STRING'
    

    Aside from toy languages, strings are usually a lot more complicated than just a sequence of characters surrounded by quotes. You normally at least have to deal with some form of escaping special characters:

    "A \t tab and a newline \n embedded in a \"string\"."
    

    Or SQL/CVS style quote escaping:

    "Embedded ""quoted string"" in a quoted string."
    

    And you might even want to do Perl/Bash style variable substitution.

    "This gets really complicated: $ButSomePeopleLikeIt"
    

    So reprocessing the string is quite common, and not just to remove the delimiters. This can be done one character (sequence) at a time with start conditions, or in a separate post-processing operation.