Search code examples
pythonjsonparsingparse-platformdeserialization

Is there a library for parsing such serialized objects in Python?


For my python program I have an input that represents serialized object, that can contain primitive types, arrays and structures.

Sample input can look like this:

Struct(1.5, false, Struct2(“text”), [1, 2, 3])

Sample output would be:

{
    type: "Struct",
    args: [
        1.5,
        False,
        {
            type: "Struct2",
            args: [ "text" ]
        },
        [ 1, 2, 3 ]
    ]
}

So, the input string can have:

  • Primitive types (integers, floats, boolean and string literals)
  • Arrays
  • Structures (structure name and a list of arguments)

Input format is quite logical, but I couldn't find any readily available libraries/code snippets to parse such format.


Solution

  • This isn't a very clean implementation, and I'm not 100% sure if it does exactly what you're looking for, but I would recommend the Lark library for doing this.

    Instead of using a ready-made parser for the job, just make your own small one, and to save time, Lark has it's "save" and "load" features, so you can save a serialized version of the parser and load that each time instead of re-creating the entire parser each runtime. Hope this helps :)

    from lark import Lark, Transformer
    
    grammar = """
    %import common.WS
    %import common.ESCAPED_STRING
    %import common.SIGNED_NUMBER
    
    %ignore WS
    
    start : struct
    
    struct  : NAME "(" [element ("," element)*] ")"
    element : struct | array | primitive
    
    array : "[" [element ("," element)*] "]"
    primitive : number
              | string
              | boolean
    
    string : ESCAPED_STRING
    number : SIGNED_NUMBER
    
    boolean : TRUE | FALSE
    
    NAME : /[a-zA-Z][a-zA-Z0-9]*/
    
    TRUE : "true"
    FALSE : "false"
    """
    
    class T(Transformer):
        def start(self, s):
            return s[0]
    
        def string(self, s):
            return s[0][1:-1].replace('\\"', '"')
    
        def primitive(self, s):
            return s[0]
    
        def struct(self, s):
            return { "type": s[0].value, "args": s[1:] }
    
        def boolean(self, s):
            return s[0].value == "true"
    
        def element(self, s):
            return s[0]
        
        array = list
    
        def number(self, s):
            try:
                return int(s[0].value)
            except:
                return float(s[0].value)
    
    parser = Lark(grammar, parser = "lalr", transformer = T())
    
    test = """
    Struct(1.5, false, Struct2("text"), [1, 2, 3])
    """
    
    print(parser.parse(test))