Search code examples
pythonjsonfile-ioreadlinetwitter-streaming-api

reading bytearray formatted strings from a file into python


I have a text file that has bytearray like formatted strings on each line such as: b'{"delete":{"status":{"id":554377123205378048,"id_str":"554377123205378048","user_id":981513812,"user_id_str":"981513812"},"timestamp_ms":"1508108338761"}}'

(this comes from streaming twitter data that was put to a text file using the command line such as "python twitterstream.py > output.txt")

I am now trying to read each line in and use json.loads() on it to get the dictionary that this line is supposed to be.

If I use line = open('output.txt').readline() I get a string that looks like this: 'b\'{"delete":{"status":{"id":554377123205378048,"id_str":"554377123205378048","user_id":981513812,"user_id_str":"981513812"},"timestamp_ms":"1508108338761"}}\'\n'

Notice the extra escape sequences that got added like 'b\'. json.loads() can no longer parse this line. I can get it to parse the original line if I manually copy the contents into python console so the line itself is fine. What is going on with my file I/O to mess it up?

Also when I copy the line manually to a variable, it is saved as a bytearray type (bytes), so I guess the question is how do I get python to read in the file lines as literal byte arrays as they are written?


Solution

  • >>> ast.literal_eval("""b'{"delete":{"status":{"id":554377123205378048,"id_str":"554377123205378048","user_id":981513812,"user_id_str":"981513812"},"timestamp_ms":"1508108338761"}}'""")
    b'{"delete":{"status":{"id":554377123205378048,"id_str":"554377123205378048","user_id":981513812,"user_id_str":"981513812"},"timestamp_ms":"1508108338761"}}'
    >>> json.loads(ast.literal_eval("""b'{"delete":{"status":{"id":554377123205378048,"id_str":"554377123205378048","user_id":981513812,"user_id_str":"981513812"},"timestamp_ms":"1508108338761"}}'""").decode('utf-8'))
    {'delete': {'status': {'user_id_str': '981513812', 'id_str': '554377123205378048', 'id': 554377123205378048, 'user_id': 981513812}, 'timestamp_ms': '1508108338761'}}