Search code examples
pythonstringsearchreplaceformat

Parsing string to dictionary


I'm working on a communications project with a radio that transmits a formatted string message, similar to:

message_string = 'Transmission\n variables \n  0.01 First variable\n  0.02 Second variable\n  0.03 Third variable \n More variables\n  0.03 Next variable\n  0.04 Another variable'

When printed, this looks like

print(message_string)
Transmission
 variables
  0.01 First variable
  0.02 Second variable
  0.03 Third variable
 More variables
  0.03 Next variable
  0.04 Another variable

This looks nice to humans, but is tricky for the computer - especially since I am trying to convert this to a python dictionary. In my actual system there are quite a few of these variables, and the code needs to systematically process all of them into a dictionary.

I think it might include something like

message_string = message_string.replace('\n','{')

but deciding which direction of brackets to use in different cases, and where to put the colons for the dictionary, is confusing me. I want an output similar to

message_dict = {
    'variables': {
       'First variable': 0.01,
       'Second variable': 0.02,
        'Third variable': 0.03},
    'More variables': {
       'Next variable': 0.03,
       'Another variable': 0.04,
    } 
}

where an error would not be thrown if one of the variables was missing from the transmission (since that sometimes happens).How do I convert this string into a dictionary?


Solution

  • Assuming that the indents increase with one space at a time, you could use this stack-based solution:

    def to_dict(s):
        result = {}
        stack = [result]
        for line in s.splitlines():
            stripped = line.lstrip()
            indent = len(line) - len(stripped) + 1
            if indent >= len(stack):
                stack.append(None)
            if stripped[0].isdigit():
                value, key = stripped.split(" ", 1)
                stack[indent-1][key] = float(value)
            else:
                stack[indent-1][stripped] = stack[indent] = {}
        
        return result
    

    Call it like this:

    message_string = 'Transmission\n variables \n  0.01 First variable\n  0.02 Second variable\n  0.03 Third variable \n More variables\n  0.03 Next variable\n  0.04 Another variable'
    d = to_dict(message_string)
    

    For this example d will be:

    {
        'Transmission': {
            'variables ': {
                'First variable': 0.01, 
                'Second variable': 0.02, 
                'Third variable ': 0.03
            }, 
            'More variables': {
                'Next variable': 0.03, 
                'Another variable': 0.04
            }
        }
    }
    

    Compared to what you wrote, this has the extra level of Transmission, but as this really is part of the input, I kept it like that.