I am trying to write a LaTeX parser using lex and yacc but I am struggling. Here is my lexer:
%{
#include "y.tab.h"
#include <stdio.h>
%}
%%
^\\begin\{.*\} {return BEG;}
%%
int yywrap() {
return 1;
}
and here is my parser:
%{
#include <stdio.h>
#include <stdlib.h>
void yyerror(char *s);
int yylex();
extern FILE *yyin;
%}
%token BEG
%%
beg: BEG {printf("Hello world\n");}
%%
void yyerror(char *s) {
fprintf(stderr, "%s\n", s);
}
int main(int argc, char **argv) {
if (argc != 2) {
fprintf(stderr, "Wrong number of arguments provided\n");
exit(1);
}
yyin = fopen(argv[1], "r");
if (!yyin) {
fprintf(stderr, "Not a valid filename\n");
exit(1);
}
yyparse();
return 0;
}
Now, if I run this on the LaTeX snippet
\begin{document}
\begin{equation}
x = 3
\end{equation}
\end{document}
I get
Hello world
syntax error
It seems like the parser is only seeing one \begin
pattern, instead of two. Why is that? I really don't see why. Thank you in advance.
EDIT: I tried something like
lines: line
| lines line
;
line: beg '\n'
| ID '\n'
;
beg: BEG {printf("Hello world\n");}
;
where ID corresponds to the regex .*
, but I get the same error.
Lexer:
%{
#include "y.tab.h"
#include <stdio.h>
#include <string.h>
%}
%%
^\\begin\{.*\} {return BEG;}
\n {
return *yytext;
}
%%
int yywrap() {
return 1;
}
Parserx:
%{
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
void yyerror(char *s);
int yylex();
extern FILE *yyin;
%}
%token BEG
%start beg
%%
beg: BEG '\n' {printf("Hello world\n");}
%%
void yyerror(char *s) {
fprintf(stderr, "%s\n", s);
}
int main(int argc, char **argv) {
if (argc != 2) {
fprintf(stderr, "Wrong number of arguments provided\n");
exit(1);
}
yyin = fopen(argv[1], "r");
if (!yyin) {
fprintf(stderr, "Not a valid filename\n");
exit(1);
}
yyparse();
return 0;
}
This above code is as much as I can remember, And I would also suggest you to first to make note of what kind of tokens you are expecting and grammar based on what you actually want to do with those tokens.
In following grammar:
lines: line
| lines line
;
line: beg '\n'
| ID '\n'
;
beg: BEG {printf("Hello world\n");}
;
lines
is a start variable, with a set of non-terminals as lines, line, and beg, and terminals (tokens) ID, BEG, '\n'. Though this grammar does not make any sense since it is based on your lexer because your lexer is also supposed to return these tokens.
The following grammar means you have a start token as beg
and you are getting a token BEG
and a token '\n'
. Based on which you are printing 'Hello World'
. Though I don't really know how this will proceed.
beg: BEG '\n' {printf("Hello world\n");}