I need to save multiple dataframes in CSV, all in a same zip file. Is it possible without making temporary files?
I tried using zipfile:
with zipfile.ZipFile("archive.zip", "w") as zf:
with zf.open(f"file1.csv", "w") as buffer:
data_frame.to_csv(buffer, mode="wb")
This works with to_excel
but fails with to_csv
as as zipfiles expects binary data and to_csv
writes a string, despite the mode="wb"
parameter:
.../lib/python3.8/site-packages/pandas/io/formats/csvs.py", line 283, in _save_header
writer.writerow(encoded_labels)
.../lib/python3.8/zipfile.py", line 1137, in write
TypeError: a bytes-like object is required, not 'str'
On the other hand, I tried using the compression
parameter of to_csv
, but the archive is overwritten, and only the last dataframe remains in the final archive.
If no other way, I'll use temporary files, but I was wondering if someone have an idea to allow to_csv
and zipfile
work together.
Thanks in advance!
I would approach this following way
import io
import pandas as pd
df = pd.DataFrame({"x":[1,2,3]})
string_io = io.StringIO()
df.to_csv(string_io)
string_io.seek(0)
df_bytes = string_io.read().encode('utf-8')
as df_bytes
is bytes it should now work with zipfile
. Edit: after looking into to_csv
help I found simpler way, to get bytes
namely:
import pandas as pd
df = pd.DataFrame({"x":[1,2,3]})
df_bytes = df.to_csv().encode('utf-8')