Antlr4 grammar for a function with variadic argument list

I'm trying to write a grammar for a DSL of mine using antlr4. In essence I'm trying to create a DSL for describing function applications in a tree structure.

Currently, I'm failing at creating the correct grammar (or using the visitor in C# correctly) for parsing expressions like

#func1(jsonconfig)
#func1(jsonconfig, #func2(...))
#func1(#func2(...), #func3(...), ..., #func_n(...))
#func1(jsonconfig, #func2(...), #func3(...), ..., #func_n(...))

my grammar (with some parts removed for brevity)

func
    : FUNCTION_START IDENTIFIER LPAREN (config?) (argumentList?) RPAREN
    ;

argument
   : func
   ;

argumentList
   : (ARG_SEPARATOR argument)+
   | ARG_SEPARATOR? argument
   ;

config
   : json
   ;

however, when trying to parse an expression I'm getting only the first argument, not the rest.

this is my visitor:

public class DslVisitor : JustDslBaseVisitor<Instruction>
{
    public override Instruction VisitFunc(JustDslParser.FuncContext context)
    {
        var name = context.IDENTIFIER().GetText();

        var conf = context.config()?.GetText();
        var arguments = context.argumentList()?.argument() ?? Array.Empty<JustDslParser.ArgumentContext>();

        var instruction = new Instruction
        {
            Name = name,
            Config = conf == null ? null : JObject.Parse(conf),
            Bindings = arguments.Select(x => x.Accept(this)).ToList()
        };

        return instruction;
    }

    public override Instruction VisitArgument(JustDslParser.ArgumentContext context)
    {
        return context.func().Accept(this);
    }
}

I think there is probably some syntax error in the antlr definition because it fails to parse a list, but successfully parses a single item. In the past I had a slightly different syntax, but it required me to always pass a config object which doesn't fit my needs.

Thanks!

Solution

Your code has a few problems.

First, you don't actually test the parse result in your code after the parse. You should add an ErrorListener and test whether the lexer and/or parser actually found errors. You can also use that to shunt the output to where ever you like.

public class ErrorListener<S> : ConsoleErrorListener<S>
{
    public bool had_error;

    public override void SyntaxError(TextWriter output, IRecognizer recognizer, S offendingSymbol, int line,
        int col, string msg, RecognitionException e)
    {
        had_error = true;
        base.SyntaxError(output, recognizer, offendingSymbol, line, col, msg, e);
    }
}

Simply create a listener, call AddErrorListener() for parser, call parse method, then test had_error for the listener. Note, you should add a listener to the lexer as well.

Next. It took a lot of editing this C# code to actually get the input that people expect. I removed the C# escapes and reformatted it to get this for the input:

#obj(
  #property(
    #unit(
      {"value":"phoneNumbers"}
    ),
    #agr_obj(
      #valueof(
    {"path":"$.phone_numbers"}
      ),
      #current(
    #valueof(
      {"path":"$.type"}
      ) ),
      #current(
    #valueof(
      {"path":"$.number"}
  ) ) ) ),
  #property(
    #unit(
      {"value":"addrs"}
    ),
    #agr_obj(
      #valueof(
    {"path":"$.addresses"}
      ),
      #current(
    #valueof(
      {"path":"$.type"}
      ) ),
      #current(
    #obj(
      #property(
        #unit(
          {"value":"city"}
        ),
        #valueof(
          {"path":"$.city"}
      ) ),
      #property(
        #unit(
          {"value":"country"}
        ),
        #valueof(
          {"path":"$.country"}
      ) ),
      #property(
        #unit(
          {"value":"street"}
        ),
        #str_join(
          {"separator":", "},
          #valueof(
        {"path":"$.street1"}
          ),
          #valueof(
        {"path":"$.street2"}
) ) ) ) ) ) ) )

Third. You don't augment your grammar with an entry rule that has EOF at the end of the rule. An EOF-augmented rule forces the parser to consume all the input. Here, I just added the rule for "start":

start : func EOF ;

You will need to change your entry point to start() rather than func().

Finally, your grammar does not recognize a json arg followed by optional func arguments. Since the first arg for a func can either be json or json , func or func, you need to make an exception for the first arg. This grammar fixes that:

grammar JustDsl;

LPAREN:             '(';
RPAREN:             ')';
FUNCTION_START:     '#';
ARG_SEPARATOR:      ',';

IDENTIFIER
    : [a-zA-Z] [a-zA-Z\-_] *
    ;

start : func EOF ;

func
    : FUNCTION_START IDENTIFIER LPAREN argumentList? RPAREN
    ;

argument
   : func
   ;

argumentList
   : (config config_rest)?
   | no_config_rest?
   ;

config_rest
   : (ARG_SEPARATOR argument)*
   ;

no_config_rest
   : argument (ARG_SEPARATOR argument)*
   ;

config
   : json
   ;

json
   : value
   ;

obj
   : '{' pair (',' pair)* '}'
   | '{' '}'
   ;

pair
   : STRING ':' value
   ;

arr
   : '[' value (',' value)* ']'
   | '[' ']'
   ;

value
   : STRING
   | NUMBER
   | obj
   | arr
   | 'true'
   | 'false'
   | 'null'
   ;


STRING
   : '"' (ESC | SAFECODEPOINT)* '"'
   ;


fragment ESC
   : '\\' (["\\/bfnrt] | UNICODE)
   ;
fragment UNICODE
   : 'u' HEX HEX HEX HEX
   ;
fragment HEX
   : [0-9a-fA-F]
   ;
fragment SAFECODEPOINT
   : ~ ["\\\u0000-\u001F]
   ;


NUMBER
   : '-'? INT ('.' [0-9] +)? EXP?
   ;


fragment INT
   : '0' | [1-9] [0-9]*
   ;

// no leading zeros

fragment EXP
   : [Ee] [+\-]? INT
   ;

// \- since - means "range" inside [...]

WS
   : [ \t\n\r] + -> skip
   ;

Mike was on the right track (but had a typo with the functionArgs rule). But without the input, this problem was difficult to solve.