I'm creating a JSON structure which I ultimately need to save to a file but am having problems with embedded line feed characters.
I first create a dictionary:
changes = {
"20161101": "Added logging",
"20161027": "Fixed scrolling bug",
"20161024": "Added summary functionality"
}
and then convert it to a single line-feed separated string:
changes_str = '\n'.join([ "{0} - {1}".format(x, y) for x, y in changes.items() ])
print changes_str
'20161101 - Added logging\n20161027 - Fixed scrolling bug\n20161024 - Added summary functionality'
So far, so good. Now I add it into string (which in reality would come from a text template):
changes_str_json_str = '{ "version": 1.1, "changes": "' + changes_str + '" }'
print changes_str_json_str
'{ "version": 1.1, "changes": 20161101 - Added logging\n20161027 - Fixed scrolling bug\n20161024 - Added summary functionality }'
but when I come to create / encode a JSON object from this using loads, I hit problems:
json_obj = json.loads(changes_str_json_str)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/python2.7/json/__init__.py", line 339, in loads
return _default_decoder.decode(s)
File "/opt/python2.7/json/decoder.py", line 364, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/opt/python2.7/json/decoder.py", line 380, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Invalid control character at: line 1 column 55 (char 54)
Changing the line feed to another character does fix the problem so clearly that's where the problem lies, however, I do need the character to be a line feed as ultimately the data in the file needs to be formatted like this (the file is passed on to another system over which I have no control. Also, as far as I know, line feed is a supported character in JSON strings.
What exactly is the problem here and how can I work around it?
In JSON you need to properly escape the control characters including \n
. Here's example on what's currently happening:
>>> import json
>>> json.loads('"foo\nbar"')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\python35\lib\json\__init__.py", line 319, in loads
return _default_decoder.decode(s)
File "C:\python35\lib\json\decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\python35\lib\json\decoder.py", line 355, in raw_decode
obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Invalid control character at: line 1 column 5 (char 4)
If you properly escape the newline character with backslash it will work as expected:
>>> json.loads('"foo\\nbar"')
'foo\nbar'
So you could fix your code by doing following:
changes_str = '\\n'.join([ "{0} - {1}".format(x, y) for x, y in changes.items() ])
The better alternative would be to first construct the object you want to output and then use dumps
so you wouldn't have to worry about escaping at all:
obj = {
'version': 1.1,
'changes': changes_str
}
changes_str_json_str = json.dumps(obj)