Search code examples
pythonjsondictionaryrrdtool

Converting str to dict in python


I got this from a process's output using subprocess.Popen() :

    { about: 'RRDtool xport JSON output',
  meta: {
    start: 1401778440,
    step: 60,
    end: 1401778440,
    legend: [
      'rta_MIN',
      'rta_MAX',
      'rta_AVERAGE'
          ]
     },
  data: [
    [ null, null, null ],
    [ null, null, null ],
    [ null, null, null ],
    [ null, null, null ],
    [ null, null, null ],
    [ null, null, null  ]
  ]
}

It doesn't seem to be a valid json to me. I have used ast.literal_eval() and json.loads(), but with no luck. Can someone help me in the right direction ? Thanks in advance.


Solution

  • Indeed, older versions of rddtool export ECMA-script, not JSON. According to this debian bug report upgrading 1.4.8 should give you proper JSON. Also see the project CHANGELOG:

    JSON output of xport is now actually json compilant by its keys being properly quoted now.

    If you cannot upgrade, you have two options here; either attempt to reformat to apply quoting the object key identifiers, or use a parser that's more lenient and parses ECMA-script object notation.

    The latter can be done with the external demjson library:

    >>> import demjson
    >>> demjson.decode('''\
    ... { about: 'RRDtool xport JSON output',
    ...   meta: {
    ...     start: 1401778440,
    ...     step: 60,
    ...     end: 1401778440,
    ...     legend: [
    ...       'rta_MIN',
    ...       'rta_MAX',
    ...       'rta_AVERAGE'
    ...           ]
    ...      },
    ...   data: [
    ...     [ null, null, null ],
    ...     [ null, null, null ],
    ...     [ null, null, null ],
    ...     [ null, null, null ],
    ...     [ null, null, null ],
    ...     [ null, null, null  ]
    ...   ]
    ... }''')
    {u'about': u'RRDtool xport JSON output', u'meta': {u'start': 1401778440, u'step': 60, u'end': 1401778440, u'legend': [u'rta_MIN', u'rta_MAX', u'rta_AVERAGE']}, u'data': [[None, None, None], [None, None, None], [None, None, None], [None, None, None], [None, None, None], [None, None, None]]}
    

    Repairing can be done using a regular expression; I am going to assume that all identifiers are on a new line or directly after the opening { curly brace. Single quotes in the list will have to be changed to double quotes; this will only work if there are no embedded single quotes in the values too:

    import re
    import json
    
    yourtext = re.sub(r'(?:^|(?<={))\s*(\w+)(?=:)', r' "\1"', yourtext, flags=re.M)
    yourtext = re.sub(r"'", r'"', yourtext)
    data = json.loads(yourtext)