Search code examples
pythonjsonjsonlines

struggling to parse an object using jsonlines


I'm having trouble parsing the body of a request using jsonlines. I'm using tornado as the server and this is happening inside a post() method. My purpose in this is to parse the request's body into separate JSONs, then iterate over them with a jsonlines Reader, do some work on each one and then push them to a DB. I solved this problem by dumping the utf-8 encoded body into a file and then used:

with jsonlines.open("temp.txt") as reader:

That works for me. I can iterate over the entire file with

for obj in reader:

I just feel like this is an unnecessary overhead that can be reduced if I can understand what's keeping me from just using this bit of code instead:

log = self.request.body.decode("utf-8")
with jsonlines.Reader(log) as reader:
   for obj in reader:

the exception I get is this:

jsonlines.jsonlines.InvalidLineError: line contains invalid json: Expecting property name enclosed in double quotes: line 1 column 2 (char 1) (line 1)

I've tried searching for this error here and all I found were examples where people tried using incorrectly formatted jsons that have one quote instead of double quotes. That is not the case for me. I debugged the request and saw that the string that returns from the decode method indeed has double quotes for both properties and values.

here is a sample of the body of the request I send (this is what it looks like in Postman):

{"type":"event","timestamp":"2018-03-25 09:19:50.999","event":"ButtonClicked","params":{"screen":"MainScreen","button":"SettingsButton"}} 
{"type":"event","timestamp":"2018-03-25 09:19:51.061","event":"ScreenShown","params":{"name":"SettingsScreen"}} 
{"type":"event","timestamp":"2018-03-25 09:19:53.580","event":"ButtonClicked","params":{"screen":"SettingsScreen","button":"MissionsButton"}} 
{"type":"event","timestamp":"2018-03-25 09:19:53.615","event":"ScreenShown","params":{"name":"MissionsScreen"}}

You can reproduce the exception by using this simple bit of code in a post method and sending the lines I provided through Postman:

log = self.request.body.decode("utf-8")
with jsonlines.Reader(log) as currentlog:
    for obj in currentlog:
        print("obj")

As a sidenote: Postman sends the data as text, not JSON.

If you need any more information to answer this question, please let me know. One thing I did notice is that the string that returns from the decode method starts and ends with one quote. I guess this is because of the double quotes in the JSONs themselves. Is it related in any way? An example:

'{"type":"event","timestamp":"2018-03-25 09:19:50.999","event":"ButtonClicked","params":{"screen":"MainScreen","button":"SettingsButton"}}'

Thanks for any help!


Solution

  • jsonlines.Reader accepts iterable as an arg ("The first argument must be an iterable that yields JSON encoded strings" not json-encoded single string as in your example), but, after .decode("utf-8"), log would be a string, which happen to support iterable interface. So when reader calls under the hood next(log) it will get first item of a log string, i.e. character { and will try to process it as an json-line which would be obviously invalid. Try log = log.split() before passing log to the Reader.