Search code examples
pythonstreamzipminio

Python: Creating Zip file from Minio objects results in duplicate entries for each file


In my application, I need to get files from Minio storage and create a Zip-file from them. Some files might be really large so I'm trying to write them in chunks to be able handle the process more efficiently. The result however is a zip file with multiple entries with the same file name. I assume these are the chunks. How can I combine the chunks so that I would only have the original file in the Zip-file? Or is there some better way to handle writing large files into Zip?

This is the code block where I write the chunks:

        zip_buffer = io.BytesIO()
        with zipfile.ZipFile(zip_buffer, "w") as zip_file:
            for url in minio_urls:
                file_name = url.split("/")[-1]

                # Retrieve the Minio object
                minio_object, object_name = get_object_from_minio(url)

                stream = minio_object.stream()

                while True:
                    chunk = next(stream, None)  # Read the next chunk
                    if chunk is None:
                        break
                    zip_file.writestr(file_name, chunk)

Solution

  • zip_file.writestr() would only be used when you have the entire entry contents to be written at once. If you want to write a chunk of the entry at a time, you need to use ent = zip_file.open(), and then ent.write().