Search code examples
pythonpandasexport-to-csvzip

Pandas - save multiple CSV in a zip archive


I need to save multiple dataframes in CSV, all in a same zip file. Is it possible without making temporary files?

I tried using zipfile:

with zipfile.ZipFile("archive.zip", "w") as zf:
    with zf.open(f"file1.csv", "w") as buffer:
        data_frame.to_csv(buffer, mode="wb")

This works with to_excel but fails with to_csv as as zipfiles expects binary data and to_csv writes a string, despite the mode="wb" parameter:

.../lib/python3.8/site-packages/pandas/io/formats/csvs.py", line 283, in _save_header

writer.writerow(encoded_labels)

.../lib/python3.8/zipfile.py", line 1137, in write

TypeError: a bytes-like object is required, not 'str'

On the other hand, I tried using the compression parameter of to_csv, but the archive is overwritten, and only the last dataframe remains in the final archive.

If no other way, I'll use temporary files, but I was wondering if someone have an idea to allow to_csv and zipfile work together.

Thanks in advance!


Solution

  • I would approach this following way

    import io
    import pandas as pd
    df = pd.DataFrame({"x":[1,2,3]})
    string_io = io.StringIO()
    df.to_csv(string_io)
    string_io.seek(0)
    df_bytes = string_io.read().encode('utf-8')
    

    as df_bytes is bytes it should now work with zipfile. Edit: after looking into to_csv help I found simpler way, to get bytes namely:

    import pandas as pd
    df = pd.DataFrame({"x":[1,2,3]})
    df_bytes = df.to_csv().encode('utf-8')