I have a large list of dict objects. I would like to store this list in a tar file to exchange remotely. I have done that successfully by writing a json.dumps() string to a tarfile object opened in 'w:gz' mode.
I am trying for a piped implementation, opening the tarfile object in 'w|gz' mode. Here is my code so far:
from json import dump
from io import StringIO
import tarfile
with StringIO() as out_stream, tarfile.open(filename, 'w|gz', out_stream) as tar_file:
for packet in json_io_format(data):
dump(packet, out_stream)
This code is in a function 'write_data'. 'json_io_format' is a generator that returns one dict object at a time from the dataset (so packet is a dict).
Here is my error:
Traceback (most recent call last):
File "pdml_parser.py", line 35, in write_data
dump(packet, out_stream)
File "/.../anaconda3/lib/python3.5/tarfile.py", line 2397, in __exit__
self.close()
File "/.../anaconda3/lib/python3.5/tarfile.py", line 1733, in close
self.fileobj.close()
File "/.../anaconda3/lib/python3.5/tarfile.py", line 459, in close
self.fileobj.write(self.buf)
TypeError: string argument expected, got 'bytes'
After some troubleshooting with help from the comments, the error is caused when the 'with' statement exits, and tries to call the context manager __exit__. I BELIEVE that this in turn calls TarFile.close(). If I remove the tarfile.open() call from the 'with' statement, and purposefully leave out the TarFile.close(), I get this code:
with StringIO() as out_stream:
tarfile.open(filename, 'w|gz', out_stream) as tar_file:
for packet in json_io_format(data):
dump(packet, out_stream)
This version of the program completes, but does not produce the output file 'filname' and yields this error:
Exception ignored in: <bound method _Stream.__del__ of <targile._Stream object at 0x7fca7a352b00>>
Traceback (most recent call last):
File "/.../anaconda3/lib/python3.5/tarfile.py", line 411, in __del__
self.close()
File "/.../anaconda3/lib/python3.5/tarfile.py", line 459, in close
self.fileobj.write(self.buf)
TypeError: string argument expected, got 'bytes'
I believe that is caused by the garbage collector. Something is preventing the TarFile object from closing.
Can anyone help me figure out what is going on here?
Why do you think you can write a tarfile to a StringIO? That doesn't work like you think it does.
This approach doesn't error, but it's not actually how you create a tarfile in memory from in-memory objects.
from json import dumps
from io import BytesIO
import tarfile
data = [{'foo': 'bar'},
{'cheese': None},
]
filename = 'fnord'
with BytesIO() as out_stream, tarfile.open(filename, 'w|gz', out_stream) as tar_file:
for packet in data:
out_stream.write(dumps(packet).encode())