Search code examples
pythonjsonpython-3.xtarfile

Dumping JSON directly into a tarfile


I have a large list of dict objects. I would like to store this list in a tar file to exchange remotely. I have done that successfully by writing a json.dumps() string to a tarfile object opened in 'w:gz' mode.

I am trying for a piped implementation, opening the tarfile object in 'w|gz' mode. Here is my code so far:

from json import dump
from io import StringIO
import tarfile

with StringIO() as out_stream, tarfile.open(filename, 'w|gz', out_stream) as tar_file:
    for packet in json_io_format(data):
        dump(packet, out_stream)

This code is in a function 'write_data'. 'json_io_format' is a generator that returns one dict object at a time from the dataset (so packet is a dict).

Here is my error:

Traceback (most recent call last):
  File "pdml_parser.py", line 35, in write_data
    dump(packet, out_stream)
  File "/.../anaconda3/lib/python3.5/tarfile.py", line 2397, in __exit__
    self.close()
  File "/.../anaconda3/lib/python3.5/tarfile.py", line 1733, in close
    self.fileobj.close()
  File "/.../anaconda3/lib/python3.5/tarfile.py", line 459, in close
    self.fileobj.write(self.buf)
TypeError: string argument expected, got 'bytes'

After some troubleshooting with help from the comments, the error is caused when the 'with' statement exits, and tries to call the context manager __exit__. I BELIEVE that this in turn calls TarFile.close(). If I remove the tarfile.open() call from the 'with' statement, and purposefully leave out the TarFile.close(), I get this code:

with StringIO() as out_stream:
    tarfile.open(filename, 'w|gz', out_stream) as tar_file:
    for packet in json_io_format(data):
        dump(packet, out_stream)

This version of the program completes, but does not produce the output file 'filname' and yields this error:

Exception ignored in: <bound method _Stream.__del__ of <targile._Stream object at 0x7fca7a352b00>>
Traceback (most recent call last):
  File "/.../anaconda3/lib/python3.5/tarfile.py", line 411, in __del__
    self.close()
  File "/.../anaconda3/lib/python3.5/tarfile.py", line 459, in close
    self.fileobj.write(self.buf)
TypeError: string argument expected, got 'bytes'

I believe that is caused by the garbage collector. Something is preventing the TarFile object from closing.

Can anyone help me figure out what is going on here?


Solution

  • Why do you think you can write a tarfile to a StringIO? That doesn't work like you think it does.

    This approach doesn't error, but it's not actually how you create a tarfile in memory from in-memory objects.

    from json import dumps                                                               
    from io import BytesIO                                                     
    import tarfile                                                                       
    
    data = [{'foo': 'bar'},                                                              
            {'cheese': None},                                                            
            ]                                                                            
    
    filename = 'fnord'                                                                   
    with BytesIO() as out_stream, tarfile.open(filename, 'w|gz', out_stream) as tar_file:
        for packet in data:                                                              
            out_stream.write(dumps(packet).encode())