Search code examples
javaantlrantlr4grammar

ANTLR Grammar not


I started writing another C-like language a few days ago and I've gotten stuck here.

The "pointers" rule seems to be colliding with the * operator in the OP token and making the * operator not recognized in the "expr" rule, and same for the & operator with the "reference" rule. How can I fix this?

grammar C;

program
  : (include | var_decl | boigaCall ';' | func_decl | typedef ';')*;

stmt
  : if_stmt
  | repeat_stmt
  | var_decl
  | var_change
  | function_call ';'
  | return_stmt ';'
  | boigaCall ';'
  | switch_stmt
  | '{' stmt* '}';

if_stmt
  : if_part else_part?;

if_part
  : 'if' paren_expr stmt;

else_part
  : 'else' stmt;

repeat_stmt
  : 'repeat' '(' expr ')' stmt;

var_decl
  : type name=ID ('=' expr)? ';';

var_change
  : pointers? name=ID ('=' | VARIABLE_MODIFIER) expr ';';

func_decl
  : (inline='inline')? type recursion? noturbo? name=ID '(' functionArgs ')' stmt;

functionArgs
  : ((ID name=ID) (',' ID name=ID)*?)?;

recursion
  : '!';

noturbo
  : '?';

paren_expr
  : '(' expr ')';

function_call
  : ID '(' expr? (',' expr)* ')';

return_stmt
  : 'return' expr?;

typedef
  : 'typedef' structdef ID;

structdef
  : 'struct' '{' (structelem ';')+ '}';

switch_stmt
  : 'switch' paren_expr switch_chain;

switch_chain
  : '{' case_block+ default_block? '}';

case_block
  : 'case' expr ':' stmt* 'break' ';';

default_block
  : 'default' ':' stmt*;

structelem
  : typedName;

typedName
  : ID name=ID;

expr
  : pointers expr
  | term
  | expr OP expr
  | cast expr
  | '(' expr ')';

term
  : ID | INT | HEX | BIN | FLOAT | STRING | boigaCall | sizeOf | function_call | reference ID;

sizeOf
  : 'sizeof' '(' ID ')';

boigaCall
  : '__boiga' '(' STRING (',' expr)* ')';

cast
  : '(' type ')';

pointers
: '*'+;

reference
: '&';

type
  : ID pointers?;

include
  : '#include' (LIBRARY | STRING);


fragment DIGIT: [0-9];
fragment LETTER: [a-zA-Z];
fragment HEX_CHAR: [a-fA-F];

STRING        : '"' (~'"'|'\\"')* '"';
LIBRARY       : '<' [a-zA-Z.]* '>';
ID            : (LETTER | '_')+ (LETTER | '_' | DIGIT)*;
INT           : '-'? DIGIT+;
HEX           : '0x' (DIGIT | HEX_CHAR)+;
BIN           : '0b' ('0' | '1')+;
FLOAT         : '-'? DIGIT+ '.' DIGIT+;
VARIABLE_MODIFIER : OP '=';
OP            : '+' | '-' | '*' | '/' | '%' | '==' | '!=' | '<' | '<=' | '>' | '>=' | '&&' | '||' | '&' | '|' | '^' | '>>' | '<<';

COMMENT       : SINGLE_COMMENT | BLOCK_COMMENT;
SINGLE_COMMENT: '//' .*? '\n';
BLOCK_COMMENT : '/*' .*? '*/';
WS: ([ \t\r\n] | COMMENT)+ -> skip;

I tried making * and & into their own token and using those tokens in the "pointers" and "reference" rule, but that only caused the & and * tokens to be seen as operators again, but not as pointers/reference anymore. I tested the "program" rule with var x = a*b; and var x = a&b;, which tests the rule that is not properly working.


Solution

  • If you just move '*' out of OP, everything works just fine. Your grammar created an implicit token when you used '*' inside of the pointers rule, so these *'s are always that token and never OP.

    When you create a token rule for this specific literal, Antlr tracks it down, and doesn't create its double. Therefore it allows the user to type either '*' or POW.

    expr
      : term
      | pointers expr
      | expr (OP | POW) expr
      | cast expr
      | '(' expr ')';
    
    POW: '*';
    OP: '+' | '-' | '/' | '%' | '==' | '!=' | '<' | '<=' | '>' | '>=' | '&&' | '||' | '&' | '|' | '^' | '>>' | '<<';
    

    Input: a*b*c

    Parse tree 1

    Input: a***b

    Parse tree 2