Search code examples
antlr4basic

Antlr4 Grammar/Rules - issue with solving BASIC print variable


The scenario is that I want to create a BASIC (high level) language using ANTRL4.

The test input below is the creation of a variable called C$ and assigning an integer value. The value assignment works. The print statement works except where concatenating the variable to it:-

     ************ TEST CASE ****************

$C=15;

print "dangerdanger!"; # print works

print "Number of GB left=" + $C;

Parse Tree Inspector

Using a Parse Tree Inspector I can see assignments are working fine but when it gets to the identification of the variable in the string it seems there is a mismatched input '+' expecting STMTEND.

I wondered if anyone could help me out here and see what adjustment I need to make to my rules and grammar to solve this issue.

Many thanks in advance.

Kevin PS. As a side issue I would rather have C$ than $C but early days...

********RULES************


VARNAME : '$'('A'..'Z')* 
        ;

CONCAT  : '+'
        ;
STMTEND : SEMICOLON NEWLINE* | NEWLINE+
        ;
STRING  : SQUOTED_STRING (CONCAT SQUOTED_STRING | CONCAT VARNAME)*
    | DQUOTED_STRING (CONCAT DQUOTED_STRING | CONCAT VARNAME)*
        ; 
fragment SQUOTED_STRING : '\'' (~['])* '\''
    ;

fragment DQUOTED_STRING  
    :  '"' ( ESC_SEQ| ~('\\'|'"') )* '"'  
    ;  

fragment ESC_SEQ  
    :   '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')  
    |   UNICODE_ESC  
    |   OCTAL_ESC  
    ;  

fragment OCTAL_ESC  
    :   '\\' ('0'..'3') ('0'..'7') ('0'..'7')  
    |   '\\' ('0'..'7') ('0'..'7')  
    |   '\\' ('0'..'7')  
    ;  

fragment HEX_DIGIT : '0x' ('0'..'9' | 'a'..'f' | 'A'..'F')+
    ;

fragment UNICODE_ESC :   '\\' 'u' HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT  
    ;  

SEMICOLON : ';' 
    ;

NEWLINE : '\r'?'\n' 


************GRAMMAR************

print_command
    :   PRINT STRING STMTEND #printCommandLabel
    ;

assignment
    : VARNAME EQUALS INTEGER STMTEND #assignInteger 
    | VARNAME EQUALS STRING STMTEND #assignString
    ;

Solution

  • You shouldn't try to create concat-expressions inside your lexer: that is the responsibility of the parser. Something like this should do it:

    print_command
     :   PRINT STRING STMTEND #printCommandLabel
     ;
    
    assignment
     : VARNAME EQUALS expression STMTEND
     ;
    
    expression
     : expression CONCAT expression
     | INTEGER
     | STRING
     | VARNAME
     ;
    
    CONCAT
     : '+'
     ;
    
    VARNAME 
     : '$'('A'..'Z')* 
     ;
    
    STMTEND 
     : SEMICOLON NEWLINE* 
     | NEWLINE+
     ;
    
    STRING
     : SQUOTED_STRING
     | DQUOTED_STRING
     ; 
    
    fragment SQUOTED_STRING
     : '\'' (~['])* '\''
     ;
    
    fragment DQUOTED_STRING  
     : '"' ( ESC_SEQ| ~('\\'|'"') )* '"'  
     ;  
    
    fragment ESC_SEQ  
     : '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')  
     | UNICODE_ESC  
     | OCTAL_ESC  
     ;  
    
    fragment OCTAL_ESC  
     : '\\' ('0'..'3') ('0'..'7') ('0'..'7')  
     | '\\' ('0'..'7') ('0'..'7')  
     | '\\' ('0'..'7')  
     ;  
    
    fragment HEX_DIGIT : '0x' ('0'..'9' | 'a'..'f' | 'A'..'F')+;
    
    fragment UNICODE_ESC :   '\\' 'u' HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT;  
    
    fragment SEMICOLON : ';';
    
    fragment NEWLINE : '\r'?'\n';