Search code examples
pythonjsondictionarylinefeed

Can't create JSON doc from dict with string containing line feed chars


I'm creating a JSON structure which I ultimately need to save to a file but am having problems with embedded line feed characters.

I first create a dictionary:

changes = {
   "20161101": "Added logging",
    "20161027": "Fixed scrolling bug",
    "20161024": "Added summary functionality"
}

and then convert it to a single line-feed separated string:

changes_str = '\n'.join([ "{0} - {1}".format(x, y) for x, y in changes.items() ])
print changes_str
'20161101 - Added logging\n20161027 - Fixed scrolling bug\n20161024 - Added summary functionality'

So far, so good. Now I add it into string (which in reality would come from a text template):

changes_str_json_str = '{ "version": 1.1, "changes": "' + changes_str + '" }'
print changes_str_json_str
'{ "version": 1.1, "changes": 20161101 - Added logging\n20161027 - Fixed scrolling bug\n20161024 - Added summary functionality }'

but when I come to create / encode a JSON object from this using loads, I hit problems:

json_obj = json.loads(changes_str_json_str)

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/python2.7/json/__init__.py", line 339, in loads
    return _default_decoder.decode(s)
  File "/opt/python2.7/json/decoder.py", line 364, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/opt/python2.7/json/decoder.py", line 380, in raw_decode
    obj, end = self.scan_once(s, idx)
ValueError: Invalid control character at: line 1 column 55 (char 54)

Changing the line feed to another character does fix the problem so clearly that's where the problem lies, however, I do need the character to be a line feed as ultimately the data in the file needs to be formatted like this (the file is passed on to another system over which I have no control. Also, as far as I know, line feed is a supported character in JSON strings.

What exactly is the problem here and how can I work around it?


Solution

  • In JSON you need to properly escape the control characters including \n. Here's example on what's currently happening:

    >>> import json
    >>> json.loads('"foo\nbar"')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "C:\python35\lib\json\__init__.py", line 319, in loads
        return _default_decoder.decode(s)
      File "C:\python35\lib\json\decoder.py", line 339, in decode
        obj, end = self.raw_decode(s, idx=_w(s, 0).end())
      File "C:\python35\lib\json\decoder.py", line 355, in raw_decode
        obj, end = self.scan_once(s, idx)
    json.decoder.JSONDecodeError: Invalid control character at: line 1 column 5 (char 4)
    

    If you properly escape the newline character with backslash it will work as expected:

    >>> json.loads('"foo\\nbar"')
    'foo\nbar'
    

    So you could fix your code by doing following:

    changes_str = '\\n'.join([ "{0} - {1}".format(x, y) for x, y in changes.items() ])
    

    The better alternative would be to first construct the object you want to output and then use dumps so you wouldn't have to worry about escaping at all:

    obj = {
        'version': 1.1,
        'changes': changes_str
    }
    changes_str_json_str = json.dumps(obj)