I am trying to build a parser but I can't seem to understand how it works. I need help with someone pointing me in the right direction so I can pick it up from there.
SO I have a tokenizer and a lexer.
Tokenizer output:
[int,add,(,int,a,,,int,b,),=,a,+,b,int,letin,(,int,a,),=,let,b,=,10,in,add,(,a,,,b,),int,equal,(,int,a,,,int,b,),=,if,a,==,b,then,letin,(,a,),else,1,int,main,(,int,input,),=,equal,(,input,,,2,)]
Lexer Output:
[TYPE_INT,IDENTIFIER,OPEN_P,TYPE_INT,IDENTIFIER,COMMA,TYPE_INT,IDENTIFIER,CLOSE_P,ASSIGN,IDENTIFIER,ARITH_ADD,IDENTIFIER,TYPE_INT,IDENTIFIER,OPEN_P,TYPE_INT,IDENTIFIER,CLOSE_P,ASSIGN,LET,IDENTIFIER,ASSIGN,IDENTIFIER,LET_IN,IDENTIFIER,OPEN_P,IDENTIFIER,COMMA,IDENTIFIER,CLOSE_P,TYPE_INT,IDENTIFIER,OPEN_P,TYPE_INT,IDENTIFIER,COMMA,TYPE_INT,IDENTIFIER,CLOSE_P,ASSIGN,COND_IF,IDENTIFIER,LOGIC_EQ,IDENTIFIER,COND_THEN,IDENTIFIER,OPEN_P,IDENTIFIER,CLOSE_P,COND_ELSE,INTEGER,TYPE_INT,IDENTIFIER,OPEN_P,TYPE_INT,IDENTIFIER,CLOSE_P,ASSIGN,IDENTIFIER,OPEN_P,IDENTIFIER,COMMA,INTEGER,CLOSE_P]
Now I have to build a parser. I don't get how to start and how to include parameterized constructs.
My rules are to be something like this.
program --> functionList.
functionList --> function,functionListCollection.
functionListCollection --> functionList | [].
function --> typeID(typeIdList),[=],expression.
typeID --> [int],[id] | [bool],[id].
typeIdList --> typeID,typeIdListCollection.
typeIdListCollection --> [,], typeIdList | [].
expression --> [if], comparison, [then], value, [else], value.
expression --> [let],[id],[=], value, [in], expression.
expression --> value, extraExpression.
extraExpression --> arithmetic | [].
arithmetic --> [+], value | [-], value.
comparison --> value, comparisonRight.
comparisonRight --> [=],[=],value.
comparisonRight --> [!], [=], value.
comparisonRight --> [>], value.
comparisonRight --> [>], [=], value.
value --> [number].
value --> [id], valueParameters.
valueParameters --> [(],parameters,[)]. | [].
parameters --> value, parametersList.
parametersList --> [,], parameters | [].
I am looking a build a predicate that takes the lexed list and gives the list out of the parser. I will then replace the numbers and identifiers by looking at the token list. Some help on how to start would be much appreciated. Thank you.
your 'Lexer Output' seems not only useless, but actually dangerous. It (arbitrarly?) renames important symbols (like int
) that are already part of the parser (btw, tokenizer and lexer are synonyms afaik). So, just feed your DCG (once corrected, see below) with tokenizer output:
tokens_test :-
Tokens = [
int,add,=,a,+,33
% ...more source code to test - but please use writeq to show it ...
%,int,let,in,'(',int,a,')',=,let,b,=,10
%,in,add,'(',a,',',b,')',int,equal,'(',int,a,',',int,b,')',=
%,if,a,==,b,then,let,in,'(',a,')',else,1
%,int,main,'(',int,input,')',=,equal,'(',input,',',2,')'
], phrase(program, Tokens).
?- tokens_test.
true
I changed your DCG, since you had [id] and [number] as terminals:
id --> [I], {atom(I)}.
number --> [N], {number(N)}.
program --> functionList.
functionList --> function,functionListCollection.
functionListCollection --> functionList | [].
function --> typeID,[=],expression.
typeID --> [int],id | [bool],id.
typeIdList --> typeID,typeIdListCollection.
typeIdListCollection --> [,], typeIdList | [].
expression --> [if], comparison, [then], value, [else], value.
expression --> [let], id, [=], value, [in], expression.
expression --> value, extraExpression.
extraExpression --> arithmetic | [].
arithmetic --> [+], value | [-], value.
comparison --> value, comparisonRight.
comparisonRight --> [=],[=],value.
comparisonRight --> [!], [=], value.
comparisonRight --> [>], value.
comparisonRight --> [>], [=], value.
value --> number.
value --> id, valueParameters.
valueParameters --> ['('],parameters,[')'] | [].
parameters --> value, parametersList.
parametersList --> [,], parameters | [].
Where you have a separator (the comma, below), you can avoid service predicates like
typeIdList --> typeID,typeIdListCollection.
typeIdListCollection --> [,], typeIdList | [].
that could be
typeIdList --> typeID, ( [,], typeIdList | [] ).
but the convenience of this 'simplification' could depends on other factors.
how to include parameterized constructs
id//0 and number//0 are non terminals that should 'get back' their value: they become id//1 and number//1, and conventionally, we used to 'tag' the value with the non terminal (so we spare symbols :-)
id(id(I)) --> [I], % better to filter out keywords here...
{atom(I), \+memberchk(I,[let, etc etc])}.
number(number(N,integer)) --> [N], {integer(N)}.
number(number(N,float)) --> [N], {float(N)}.
now where they are called, we must change to
...
typeID --> [int],id(_) | [bool],id(_).
...
it's obvious that typeID//0 now should 'get back' the values (so called semantic attributes), otherwise they are lost:
...
typeID(typeID(T,I)) --> [int],id(I),{T=int} | [bool],id(I),{T=bool}.
...
that shows a better way to write typeID//1. Could be
typeID(typeID(T,I)) --> type(T),id(I).
type(int) --> [int].
type(bool) --> [bool].
Hope you get the picture :-)