Search code examples
compiler-constructionboost-spirit

Parsing function arguments without arguments delimiters


I'm working on a project where I need to translate NetLogo to another programming language. I'm using Boost Spirit and I've already implemented some of the project grammars that store simple code syntax into an AST.

The problem I'm facing is that right now I can't tell if an identifier is a variable name or a function name. Also, I don't know if a specific function call needs one, two, or multiple arguments, so I don't know when to stop looking for more arguments.

For example, a function call can look like

id1 id2 id3 id4

That could be:

  • id3 is a function that has id4 as argument (let's say the return value of that is id5), and id1 is a function that has id2 and id5 as arguments

But it could also be:

  • id1 has id2 id3 id4 as arguments (all but id1 are variable names)

I've thought about using Symbols and adding new items each time a variable or function is declared, this would help to differentiate variable names and function names, but...

  • How can/should I store the number of arguments a function requires using Boost Spirit? Maybe using another Symbol table with Semantic Actions while parsing the function definition?
  • Once I know how to get the number of arguments needed, how can I get that value once I find a function identifier while parsing an expression?
  • It's a good solution to use Symbols to differentiate variable names from function names?

Solution

  • Finally what I've done is the following:

    • Created a Symbols table using the function name as the key, and the number of arguments as data stored.
    qi::symbols<char, int> f_args;
    
    • Used a Semantic Action on the function parser to get the function name and arguments list, and send it to an external function to store the data on the Symbols table.
    void store_function (std::string name, std::list<std::string> args) {
        f_args.add(name, args.size());
        std::cout << name << " " <<  args.size() << std::endl;
    }
    
    function_ = (
            lexeme[(string("to-report") | string("to")) >> !(alnum | '_')]  // make sure we have whole words
       >   identifier 
       >   ('[' > argument_list > ']')
       >   body
       >   lexeme[string("end") >> !(alnum | '_')]
    ) [ phx::bind(&store_function, _2, _3) ];
    
    • When a function name is found outside a function definition (meaning that is a function call), I load the data stored on the Symbols Table to use it on a repeat directive and expect the exact number of arguments the function needs.
    function_call = 
        function_name >> 
        repeat( phx::ref(n_args) )[identifier];
    
    function_name = 
        !lexeme[keywords >> !(alnum | '_')] >> 
        &lexeme[f_args [phx::ref(n_args) = _1] >> !(alnum | '_')] >> 
        raw[lexeme[(alpha | '_') >> *(alnum | '_' | '-')]];
    

    The only question this answer does not answer is the last one. Hopefully, someone with more experience in this field will explain it.