I'm trying to write a grammar for a DSL of mine using antlr4. In essence I'm trying to create a DSL for describing function applications in a tree structure.
Currently, I'm failing at creating the correct grammar (or using the visitor in C# correctly) for parsing expressions like
#func1(jsonconfig)
#func1(jsonconfig, #func2(...))
#func1(#func2(...), #func3(...), ..., #func_n(...))
#func1(jsonconfig, #func2(...), #func3(...), ..., #func_n(...))
my grammar (with some parts removed for brevity)
func
: FUNCTION_START IDENTIFIER LPAREN (config?) (argumentList?) RPAREN
;
argument
: func
;
argumentList
: (ARG_SEPARATOR argument)+
| ARG_SEPARATOR? argument
;
config
: json
;
however, when trying to parse an expression I'm getting only the first argument, not the rest.
this is my visitor:
public class DslVisitor : JustDslBaseVisitor<Instruction>
{
public override Instruction VisitFunc(JustDslParser.FuncContext context)
{
var name = context.IDENTIFIER().GetText();
var conf = context.config()?.GetText();
var arguments = context.argumentList()?.argument() ?? Array.Empty<JustDslParser.ArgumentContext>();
var instruction = new Instruction
{
Name = name,
Config = conf == null ? null : JObject.Parse(conf),
Bindings = arguments.Select(x => x.Accept(this)).ToList()
};
return instruction;
}
public override Instruction VisitArgument(JustDslParser.ArgumentContext context)
{
return context.func().Accept(this);
}
}
I think there is probably some syntax error in the antlr
definition because it fails to parse a list, but successfully parses a single item.
In the past I had a slightly different syntax, but it required me to always pass a config object which doesn't fit my needs.
Thanks!
Your code has a few problems.
First, you don't actually test the parse result in your code after the parse. You should add an ErrorListener and test whether the lexer and/or parser actually found errors. You can also use that to shunt the output to where ever you like.
public class ErrorListener<S> : ConsoleErrorListener<S>
{
public bool had_error;
public override void SyntaxError(TextWriter output, IRecognizer recognizer, S offendingSymbol, int line,
int col, string msg, RecognitionException e)
{
had_error = true;
base.SyntaxError(output, recognizer, offendingSymbol, line, col, msg, e);
}
}
Simply create a listener, call AddErrorListener()
for parser, call parse method, then test had_error
for the listener. Note, you should add a listener to the lexer as well.
Next. It took a lot of editing this C# code to actually get the input that people expect. I removed the C# escapes and reformatted it to get this for the input:
#obj(
#property(
#unit(
{"value":"phoneNumbers"}
),
#agr_obj(
#valueof(
{"path":"$.phone_numbers"}
),
#current(
#valueof(
{"path":"$.type"}
) ),
#current(
#valueof(
{"path":"$.number"}
) ) ) ),
#property(
#unit(
{"value":"addrs"}
),
#agr_obj(
#valueof(
{"path":"$.addresses"}
),
#current(
#valueof(
{"path":"$.type"}
) ),
#current(
#obj(
#property(
#unit(
{"value":"city"}
),
#valueof(
{"path":"$.city"}
) ),
#property(
#unit(
{"value":"country"}
),
#valueof(
{"path":"$.country"}
) ),
#property(
#unit(
{"value":"street"}
),
#str_join(
{"separator":", "},
#valueof(
{"path":"$.street1"}
),
#valueof(
{"path":"$.street2"}
) ) ) ) ) ) ) )
Third. You don't augment your grammar with an entry rule that has EOF at the end of the rule. An EOF-augmented rule forces the parser to consume all the input. Here, I just added the rule for "start":
start : func EOF ;
You will need to change your entry point to start()
rather than func()
.
Finally, your grammar does not recognize a json
arg followed by optional func
arguments. Since the first arg for a func
can either be json
or json , func
or func
, you need to make an exception for the first arg. This grammar fixes that:
grammar JustDsl;
LPAREN: '(';
RPAREN: ')';
FUNCTION_START: '#';
ARG_SEPARATOR: ',';
IDENTIFIER
: [a-zA-Z] [a-zA-Z\-_] *
;
start : func EOF ;
func
: FUNCTION_START IDENTIFIER LPAREN argumentList? RPAREN
;
argument
: func
;
argumentList
: (config config_rest)?
| no_config_rest?
;
config_rest
: (ARG_SEPARATOR argument)*
;
no_config_rest
: argument (ARG_SEPARATOR argument)*
;
config
: json
;
json
: value
;
obj
: '{' pair (',' pair)* '}'
| '{' '}'
;
pair
: STRING ':' value
;
arr
: '[' value (',' value)* ']'
| '[' ']'
;
value
: STRING
| NUMBER
| obj
| arr
| 'true'
| 'false'
| 'null'
;
STRING
: '"' (ESC | SAFECODEPOINT)* '"'
;
fragment ESC
: '\\' (["\\/bfnrt] | UNICODE)
;
fragment UNICODE
: 'u' HEX HEX HEX HEX
;
fragment HEX
: [0-9a-fA-F]
;
fragment SAFECODEPOINT
: ~ ["\\\u0000-\u001F]
;
NUMBER
: '-'? INT ('.' [0-9] +)? EXP?
;
fragment INT
: '0' | [1-9] [0-9]*
;
// no leading zeros
fragment EXP
: [Ee] [+\-]? INT
;
// \- since - means "range" inside [...]
WS
: [ \t\n\r] + -> skip
;
Mike was on the right track (but had a typo with the functionArgs
rule). But without the input, this problem was difficult to solve.