I am trying to write a grammar that can parse the following 3 inputs
-- testfile --
class hi implements ho:
var x:int;
end;
-- testfile2 --
interface xs:
myFunc(int,int):int
end;
-- testfile3 --
class hi implements ho:
method myMethod(x:int)
return y;
end
end;
this is lexer.l
:
%{
#include <stdio.h>
#include <stdlib.h>
#include "parser.tab.h"
#include <string.h>
int line_number = 0;
void lexerror(char *message);
%}
newline (\n|\r\n)
whitespace [\t \n\r]*
digit [0-9]
alphaChar [a-zA-Z]
alphaNumChar ({digit}|{alphaChar})
hexDigit ({digit}|[A-Fa-f])
decNum {digit}+
hexNum {digit}{hexDigit}*H
identifier {alphaChar}{alphaNumChar}*
number ({hexNum}|{decNum})
comment "/*"[.\r\n]*"*/"
anything .
%s InComment
%option noyywrap
%%
<INITIAL>{
interface return INTERFACE;
end return END;
class return CLASS;
implements return IMPLEMENTS;
var return VAR;
method return METHOD;
int return INT;
return return RETURN;
if return IF;
then return THEN;
else return ELSE;
while return WHILE;
do return DO;
not return NOT;
and return AND;
new return NEW;
this return THIS;
null return _NULL;
":" return COL;
";" return SCOL;
"(" return BRACL;
")" return BRACR;
"." return DOT;
"," return COMMA;
"=" return ASSIGNMENT;
"+" return PLUS;
"-" return MINUS;
"*" return ASTERISK;
"<" return LT;
{decNum} {
yylval = atoi(yytext);
return DEC;
}
{hexNum} {
const int len = strlen(yytext)-1;
char* substr = (char*) malloc(sizeof(char) * len);
strncpy(substr,yytext,len);
yylval = (int)strtol
( substr
, NULL
, 16);
free (substr);
return HEX;
}
{identifier} {
yylval= (char *) malloc(sizeof(char)*strlen(yytext));
strcpy(yylval, yytext);
return ID;
}
{whitespace} {}
"/*" BEGIN InComment;
}
{newline} line_number++;
<InComment>{
"*/" BEGIN INITIAL;
{anything} {}
}
. lexerror("Illegal input");
%%
void lexerror(char *message)
{
fprintf(stderr,"Error: \"%s\" in line %d. = %s\n",
message,line_number,yytext);
exit(1);
}
this is parser.y
:
%{
# include <stdio.h>
int yylex(void);
void yyerror(char *);
extern int line_number;
%}
%start Program
%token INTERFACE END CLASS IMPLEMENTS VAR METHOD INT RETURN IF THEN ELSE
%token WHILE DO NOT AND NEW THIS _NULL EOC SCOL COL BRACL BRACR DOT COMMA
%token ASSIGNMENT PLUS ASTERISK MINUS LT EQ DEC HEX ID NEWLINE
%%
Program: INTERFACE Interface SCOL { printf("interface\n"); }
| CLASS Class SCOL { printf("class\n");}
| error { printf("error on: %s\n", $$); }
;
Interface: ID COL
AbstractMethod
END
;
AbstractMethod: ID BRACL Types BRACR COL Type
;
Types : Type COMMA Types
| Type
;
Class: ID
IMPLEMENTS ID COL
Member SCOL
END
;
Member: VAR ID COL Type
| METHOD ID BRACL Pars BRACR Stats END
;
Type: INT
| ID
;
Pars: Par COMMA Pars
| Par
;
Par: ID COL Type
;
Stats: Stat SCOL Stat
| Stat
;
Stat: RETURN Expr
| IF Expr THEN Stats MaybeElse END
| WHILE Expr DO Stats END
| VAR ID COL Type COL ASSIGNMENT Expr
| ID COL ASSIGNMENT Expr
| Expr
;
MaybeElse :
| ELSE Stats
;
Expr: NOT Term
| NEW ID
| Term PLUS Term
| Term ASTERISK Term
| Term AND Term
| Term ArithOp Term
| Term
;
ArithOp: MINUS
| LT
| ASSIGNMENT
;
Term: BRACL Expr BRACR
| Num
| THIS
| ID
| Term DOT ID BRACL Exprs BRACR
| error { printf("error in term: %s\n", $$); }
;
Num : HEX
| INT
;
Exprs : Expr COMMA Exprs
| Expr
;
%%
void yyerror(char *s) {
fprintf(stderr, "Parse Error on line %i: %s\n", line_number, s);
}
int main(void){
yyparse();
}
the first two inputs are recognized as expected,
However, the third one fails with the error error on: y
and I don't have an idea why.
As I see it, this should be a Class
with a Member
METHOD
that contains a Stat
(ement) RETURN
with an Expr
Term
being an ID
.
I tried commenting and removing all the unneccesary bits, but the result is still the same. I also took a look at the parser to verify that my identifiers parse correctly, but as I see it they should.
Why is the y
in return y
not recognized here?
Is there some conflict in the grammar I am unaware of?
(Please note that I am not expecting you to fix the complete grammar; I am merely asking for the reason this is not working. I am sure there are other errors in there, but I am really stuck fixing this one.)
here is also my makefile:
CC = gcc
LEX = flex
YAC = bison
scanner: parser.y lexer.l
$(YAC) -d -Wcounterexamples parser.y
$(LEX) lexer.l
$(CC) parser.tab.c parser.tab.h lex.yy.c -o parser
clean:
rm -f *.tab.h *.tab.c *.gch *.yy.c
rm ./parser
testing:
cat testfile3 | ./parser
First you have one error in your grammar :
Stats: Stat SCOL Stat
| Stat
;
must be
Stats: Stat SCOL Stats
| Stat
;
('s' added at the end of line)
Second your definition in testfile3 does not follow your grammar and must be
class hi implements ho:
method myMethod(x:int)
return y
end;
end;
so the ';' after return y
must be moved after the first end
(and return x
seems more logical, but this is an other subject, you do not check the validity of the ID)
Out of that a class can have only one member, it's very limited / restrictive