Superpower parser for nested string representation of object tree

I'm struggling to understand how recursive parsing works in Superpower. I've studied the blog posts and the examples on github but still don't understand.

Can somebody tell me how, from the Tokenizer I wrote, I could rebuild the AST with the proposed structured (see bellow) ?

This is my goal :

I'm working with a Kuka robot. Through a tcp client I can read the content of a variable on the robot controler. The content of the variable is returned to me as a single string. I want to parse this string and populate a custom AST adapted to the robot laguage.

Kuka Robot Language (KRL):

In the robot language, I have the following primitive types : BOOL, INT, CHAR, REAL

I also have the ability to create custom enumerations. The value of an enumeration is preceded by '#' : ENUM

A string is represented as a CHAR array : CHAR[]

In addition, there is the possibility to create composite structures called STRUC. A struc aggregates field-value datas (could be BOOL, INT, CHAR, STRING, REAL, ENUM or STRUC): STRUC

Sample of data to parse :

Here is a typical example of data I want to parse, when I ask the robot for the variable progLogDb[1] which is the first item of the progLogDb, an array of robot program logs where each item is a PROLOG struc :

{PROGLOG: ProgName[] "any ascii string {[]%,&}", StartDate {DATE: CSEC 0.124, SEC -22, MIN 36, HOUR 16, DAY 4, MONTH 1, YEAR 2019}, EndDate {DATE: CSEC 0, SEC 36, MIN 36, HOUR 16, DAY 4, MONTH 1, YEAR 2019}, QuitDate {DATE: CSEC 0, SEC 36, MIN 36, HOUR 16, DAY 4, MONTH 1, YEAR 2019}, ActiveTime 11.00000, MyInt 10, MyReal -1.091e-24, MyCHAR "A", MyBool False, MyEnum #EnumValue}

In this sample, you can see how struc are nested. A struc writes : {Type: key-value, key-value, ...} where the value is either BOOL, INT, REAL, ENUM, STRING, STRUC. If the value is a primitive data type then the type has to be inferred in the parsing process.

This this the tree I want to build :

proglogDB[1] (PROGLOG:) 
    - ProgName[] "lgocell_mdi"
    - StartDate (DATE:) 
        - CSEC 0
        - SEC 22
        - MIN 36
        - HOUR 16
        - DAY 4
        - MONTH 1
        - YEAR 2019
    - EndDate (DATE:)  
        - CSEC 
        - SEC 3
        - MIN 36
        - HOUR 16
        - DAY 4
        - MONTH 1
        - YEAR 2019}
    - QuitDate (DATE:)  
        - CSEC 0, 
        - SEC 36, 
        - MIN 36, 
        - HOUR 16, 
        - DAY 4, 
        - MONTH 1, 
        - YEAR 2019
    - ActiveTime 11.00000
    - MyInt 10
    - MyReal -1.091e-24
    - MyCHAR "A" 
    - MyBool False
    - MyEnum #EnumValue

Tokenization

So far, I've succeed the tokenization part (I believe) with this code :

enum KrlToken
{
    // struct delimiters
    [Token(Example = "{")]
    LBracket,

    [Token(Example = "}")]
    RBracket,

    // field delimiters
    [Token(Example = ",")]
    Comma,

    // data
    Type,
    Boolean,
    Integer,
    Real,
    String,
    Enum,
    Identifier,
}

static class KrlTokenizer
{
    #region TokenParser
    static TextParser<Unit> KrlBooleanToken { get; } =
        from content in Span.EqualToIgnoreCase("false")
            .Or(Span.EqualToIgnoreCase("true"))
        select Unit.Value;

    static TextParser<Unit> KrlStringToken { get; } =
        from open in Character.EqualTo('"')
        from content in Span.EqualTo("\\\"").Value(Unit.Value).Try()
            .Or(Span.EqualTo("\\\\").Value(Unit.Value).Try())
            .Or(Character.Except('"').Value(Unit.Value))
            .IgnoreMany()
        from close in Character.EqualTo('"')
        select Unit.Value;

    static TextParser<Unit> KrlIntegerToken { get; } =
        from sign in Character.EqualTo('-').OptionalOrDefault()
        from first in Character.Digit
        from rest in Character.Digit.IgnoreMany()
        select Unit.Value;

    static TextParser<Unit> KrlRealToken { get; } =
        from sign in Character.EqualTo('-').OptionalOrDefault()
        from first in Character.Digit
        from rest in Character.Digit.Or(Character.In('.', 'e', 'E', '+', '-')).IgnoreMany()
        select Unit.Value;

    static TextParser<Unit> KrlEnumToken { get; } =
        from open in Character.EqualTo('#')
        from first in Character.Letter.Or(Character.In('_', '$'))
        from rest in Character.Letter.Or(Character.Digit).Or(Character.In('_', '$'))
            .IgnoreMany()
        select Unit.Value;

    static TextParser<Unit> KrlTypeToken { get; } =
        from first in Character.Letter.Or(Character.In('_', '$'))
        from rest in Character.Letter.Or(Character.Digit).Or(Character.In('_', '$'))
            .IgnoreMany()
        from close in Character.EqualTo(':')
        select Unit.Value;

    static TextParser<Unit> KrlIdentifierToken { get; } =
        from first in Character.Letter.Or(Character.In('_', '$'))
        from rest in Character.Letter.Or(Character.Digit).Or(Character.In('_', '$', '[', ']'))
            .IgnoreMany()
        select Unit.Value;


    #endregion

    public static Tokenizer<KrlToken> Instance { get; } =
        new TokenizerBuilder<KrlToken>()
            .Ignore(Span.WhiteSpace)
            .Match(Character.EqualTo('{'), KrlToken.LBracket)
            .Match(Character.EqualTo('}'), KrlToken.RBracket)
            .Match(Character.EqualTo(','), KrlToken.Comma)
            .Match(KrlTypeToken, KrlToken.Type)
            .Match(KrlEnumToken, KrlToken.Enum)
            .Match(KrlStringToken, KrlToken.String)
            .Match(KrlBooleanToken, KrlToken.Boolean)
            .Match(KrlIntegerToken, KrlToken.Integer, requireDelimiters: true)
            .Match(KrlRealToken, KrlToken.Real, requireDelimiters: true)
            .Match(KrlIdentifierToken, KrlToken.Identifier, requireDelimiters: true)
            .Build();
}

Which, for the sample, gives me the following tokens :

LBracket@0 (line 1, column 1): {
Type@1 (line 1, column 2): PROGLOG:
Identifier@10 (line 1, column 11): ProgName[]
String@21 (line 1, column 22): "lgocell_mdi{} {[]%,&}"
Comma@44 (line 1, column 45): ,
Identifier@46 (line 1, column 47): StartDate
LBracket@56 (line 1, column 57): {
Type@57 (line 1, column 58): DATE:
Identifier@63 (line 1, column 64): CSEC
Real@68 (line 1, column 69): 0.124
Comma@73 (line 1, column 74): ,
Identifier@75 (line 1, column 76): SEC
Integer@79 (line 1, column 80): -22
Comma@82 (line 1, column 83): ,
Identifier@84 (line 1, column 85): MIN
Integer@88 (line 1, column 89): 36
Comma@90 (line 1, column 91): ,
Identifier@92 (line 1, column 93): HOUR
Integer@97 (line 1, column 98): 16
Comma@99 (line 1, column 100): ,
Identifier@101 (line 1, column 102): DAY
Integer@105 (line 1, column 106): 4
Comma@106 (line 1, column 107): ,
Identifier@108 (line 1, column 109): MONTH
Integer@114 (line 1, column 115): 1
Comma@115 (line 1, column 116): ,
Identifier@117 (line 1, column 118): YEAR
Integer@122 (line 1, column 123): 2019
RBracket@126 (line 1, column 127): }
Comma@127 (line 1, column 128): ,
Identifier@129 (line 1, column 130): EndDate
LBracket@137 (line 1, column 138): {
Type@138 (line 1, column 139): DATE:
Identifier@144 (line 1, column 145): CSEC
Integer@149 (line 1, column 150): 0
Comma@150 (line 1, column 151): ,
Identifier@152 (line 1, column 153): SEC
Integer@156 (line 1, column 157): 36
Comma@158 (line 1, column 159): ,
Identifier@160 (line 1, column 161): MIN
Integer@164 (line 1, column 165): 36
Comma@166 (line 1, column 167): ,
Identifier@168 (line 1, column 169): HOUR
Integer@173 (line 1, column 174): 16
Comma@175 (line 1, column 176): ,
Identifier@177 (line 1, column 178): DAY
Integer@181 (line 1, column 182): 4
Comma@182 (line 1, column 183): ,
Identifier@184 (line 1, column 185): MONTH
Integer@190 (line 1, column 191): 1
Comma@191 (line 1, column 192): ,
Identifier@193 (line 1, column 194): YEAR
Integer@198 (line 1, column 199): 2019
RBracket@202 (line 1, column 203): }
Comma@203 (line 1, column 204): ,
Identifier@205 (line 1, column 206): QuitDate
LBracket@214 (line 1, column 215): {
Type@215 (line 1, column 216): DATE:
Identifier@221 (line 1, column 222): CSEC
Integer@226 (line 1, column 227): 0
Comma@227 (line 1, column 228): ,
Identifier@229 (line 1, column 230): SEC
Integer@233 (line 1, column 234): 36
Comma@235 (line 1, column 236): ,
Identifier@237 (line 1, column 238): MIN
Integer@241 (line 1, column 242): 36
Comma@243 (line 1, column 244): ,
Identifier@245 (line 1, column 246): HOUR
Integer@250 (line 1, column 251): 16
Comma@252 (line 1, column 253): ,
Identifier@254 (line 1, column 255): DAY
Integer@258 (line 1, column 259): 4
Comma@259 (line 1, column 260): ,
Identifier@261 (line 1, column 262): MONTH
Integer@267 (line 1, column 268): 1
Comma@268 (line 1, column 269): ,
Identifier@270 (line 1, column 271): YEAR
Integer@275 (line 1, column 276): 2019
RBracket@279 (line 1, column 280): }
Comma@280 (line 1, column 281): ,
Identifier@282 (line 1, column 283): ActiveTime
Real@293 (line 1, column 294): 11.00000
Comma@301 (line 1, column 302): ,
Identifier@303 (line 1, column 304): MyEnum
Enum@310 (line 1, column 311): #EnumValue
Comma@320 (line 1, column 321): ,
Identifier@322 (line 1, column 323): MyInt
Integer@328 (line 1, column 329): 10
Comma@330 (line 1, column 331): ,
Identifier@332 (line 1, column 333): MyReal
Real@339 (line 1, column 340): -1.091e-24
Comma@349 (line 1, column 350): ,
Identifier@351 (line 1, column 352): MyChar
String@358 (line 1, column 359): "A"
Comma@361 (line 1, column 362): ,
Identifier@363 (line 1, column 364): MyBool
Boolean@370 (line 1, column 371): False
RBracket@375 (line 1, column 376): }

Parsing into AST

So now that my tokenization looks good, I want to parse the tokens into a custom AST, that is associate field-value pairs, infer primitive types, and recreate proper nesting of struc. Any help on this part would be appreciated.

public enum DataType
{
    BOOL,
    INT,
    REAL,
    STRING,
    ENUM,
    STRUC
}

public abstract class Data
{
    private static Regex _array = new Regex(@"\[([\d]+)\]", RegexOptions.IgnoreCase);

    public abstract DataType Type { get; }
    public string Name { get; set; }

    public bool IsScalar { get => Type != DataType.STRUC; }
    public bool IsComposite { get => Type == DataType.STRUC; }
    public bool IsArrayElement(out short index)
    {
        index = 0;
        Match match = _array.Match(Name);
        if (match.Success)
        {
            index = short.Parse(match.Groups[1].Value);
            return true;
        }
        else
        {
            return false;
        }
    }


}

public class BoolData : Data
{
    public override DataType Type => DataType.BOOL;
    public bool Value { get; private set; }
    public BoolData(string name, bool value)
    {
        Name = name;
        Value = value;
    }
}
public class IntData : Data
{
    public override DataType Type => DataType.INT;
    public short Value { get; private set; }
    public IntData(string name, short value)
    {
        Name = name;
        Value = value;
    }
}
public class RealData : Data
{
    public override DataType Type => DataType.REAL;
    public double Value { get; private set; }
    public RealData(string name, double value)
    {
        Name = name;
        Value = value;
    }
}
public class StringData : Data
{
    public override DataType Type => DataType.STRING;
    public string Value { get; private set; }
    public StringData(string name, string value)
    {
        Name = name;
        Value = value;
    }
}
public class EnumData : Data
{
    public override DataType Type => DataType.ENUM;
    public string Value { get; private set; }
    public EnumData(string name, string value)
    {
        Name = name;
        Value = value;
    }
}
public class StrucData : Data
{
    public override DataType Type => DataType.STRUC;
    public List<Data> Value = new List<Data>();

    public StrucData(string name)
    {
        Name = name;
        Value = new List<Data>();
    }
    public void Add(Data data) => Value.Add(data);
}

Solution

So you need to create a parser for each Data class you've defined. The primitive types are fairly straightforward, but the StrucData parser is the one that needs to be recursive. It has to try each of the primitive parsers using Or().Try(), but if those aren't successful, it has to try and parse another StrucData using recursion. Then after successful parsing of that, you can get a List<Data> result by using the function ManyDelimitedBy since each of your Data objects is separated by a comma.

Try this:

public static class KrlParsers
{
    public static TokenListParser<KrlToken, BoolData> BoolParser =
        from id in Token.EqualTo(KrlToken.Identifier)
        from val in Token.EqualTo(KrlToken.Boolean)
        select new BoolData(id.ToStringValue(), bool.Parse(val.ToStringValue()));

    public static TokenListParser<KrlToken, IntData> IntParser =
        from id in Token.EqualTo(KrlToken.Identifier)
        from val in Token.EqualTo(KrlToken.Integer)
        select new IntData(id.ToStringValue(), short.Parse(val.ToStringValue()));

    public static TokenListParser<KrlToken, RealData> RealParser =
        from id in Token.EqualTo(KrlToken.Identifier)
        from val in Token.EqualTo(KrlToken.Real)
        select new RealData(id.ToStringValue(), double.Parse(val.ToStringValue()));

    public static TokenListParser<KrlToken, StringData> StringParser =
        from id in Token.EqualTo(KrlToken.Identifier)
        from val in Token.EqualTo(KrlToken.String)
        select new StringData(id.ToStringValue(), val.ToStringValue());

    public static TokenListParser<KrlToken, EnumData> EnumParser =
        from id in Token.EqualTo(KrlToken.Identifier)
        from val in Token.EqualTo(KrlToken.Enum)
        select new EnumData(id.ToStringValue(), val.ToStringValue());

    public static TokenListParser<KrlToken, StrucData> StrucParser =
        from id in Token.EqualTo(KrlToken.Identifier).Optional()
        from _lb in Token.EqualTo(KrlToken.LBracket)
        from type in Token.EqualTo(KrlToken.Type)
        from data in
            StringParser.Select(x => (Data)x).Try()
            .Or(IntParser.Select(x => (Data)x)).Try()
            .Or(RealParser.Select(x => (Data)x)).Try()
            .Or(BoolParser.Select(x => (Data)x)).Try()
            .Or(EnumParser.Select(x => (Data)x)).Try()
            .Or(StrucParser.Select(x => (Data)x)).Try() // RECURSIVE
            .ManyDelimitedBy(Token.EqualTo(KrlToken.Comma))
        from _rb in Token.EqualTo(KrlToken.RBracket)
        select new StrucData(id.HasValue ? id.Value.ToStringValue() : "", data.ToList());
}

I also added another constructor for the StrucData class to accept the List<Data>:

public StrucData(string name, List<Data> data)
{
    Name = name;
    Value = data;
}

Then to actually parse an input string, run this:

string input = @"{PROGLOG: ProgName[] ""any ascii string {[]%,&}"", StartDate {DATE: CSEC 0.124, SEC -22, MIN 36, HOUR 16, DAY 4, MONTH 1, YEAR 2019}, EndDate {DATE: CSEC 0, SEC 36, MIN 36, HOUR 16, DAY 4, MONTH 1, YEAR 2019}, QuitDate {DATE: CSEC 0, SEC 36, MIN 36, HOUR 16, DAY 4, MONTH 1, YEAR 2019}, ActiveTime 11.00000, MyInt 10, MyReal -1.091e-24, MyCHAR ""A"", MyBool False, MyEnum #EnumValue}";

var tokens = KrlTokenizer.Instance.Tokenize(input);
StrucData data = KrlParsers.StrucParser.Parse(tokens);