I'm working on a project where I need to translate NetLogo to another programming language. I'm using Boost Spirit and I've already implemented some of the project grammars that store simple code syntax into an AST.
The problem I'm facing is that right now I can't tell if an identifier is a variable name or a function name. Also, I don't know if a specific function call needs one, two, or multiple arguments, so I don't know when to stop looking for more arguments.
For example, a function call can look like
id1 id2 id3 id4
That could be:
id3
is a function that has id4
as argument (let's say the return value of that is id5
), and id1
is a function that has id2
and id5
as arguments But it could also be:
id1
has id2
id3
id4
as arguments (all but id1
are variable names)I've thought about using Symbols and adding new items each time a variable or function is declared, this would help to differentiate variable names and function names, but...
Finally what I've done is the following:
qi::symbols<char, int> f_args;
void store_function (std::string name, std::list<std::string> args) {
f_args.add(name, args.size());
std::cout << name << " " << args.size() << std::endl;
}
function_ = (
lexeme[(string("to-report") | string("to")) >> !(alnum | '_')] // make sure we have whole words
> identifier
> ('[' > argument_list > ']')
> body
> lexeme[string("end") >> !(alnum | '_')]
) [ phx::bind(&store_function, _2, _3) ];
function_call =
function_name >>
repeat( phx::ref(n_args) )[identifier];
function_name =
!lexeme[keywords >> !(alnum | '_')] >>
&lexeme[f_args [phx::ref(n_args) = _1] >> !(alnum | '_')] >>
raw[lexeme[(alpha | '_') >> *(alnum | '_' | '-')]];
The only question this answer does not answer is the last one. Hopefully, someone with more experience in this field will explain it.