Search code examples
pythonpython-2.7text-filesstringiopython-zipfile

Generating ZipFile with List of StringIO Object, CRC Error when Opening the ZipFIle


I am currently facing some difficulty regarding generating Text file with multiple line of text and add it to a ZipFile in memory with Python 2.7.

The code below is able to generate zip file with 4 text file, each file have 1 line of words.

If I modify the code "temp[0].write('first in-memory temp file')" to multiple line string, the zip file generated out will have crc error.

I have tried string escape, but it failed.

May I know what should I do in order to generate the ZipFile filled with MultipleLine-enabled Text File?

# coding: utf-8

import StringIO
import zipfile

# This is where my zip will be written
buff = StringIO.StringIO()
# This is my zip file
zip_archive = zipfile.ZipFile(buff, mode='w')

temp = []
for i in range(4):
    # One 'memory file' for each file 
    # I want in my zip archive
    temp.append(StringIO.StringIO())

# Writing something to the files, to be able to
# distinguish them
temp[0].write('first in-memory temp file')
temp[1].write('second in-memory temp file')
temp[2].write('third in-memory temp file')
temp[3].write('fourth in-memory temp file')

for i in range(4):
    # The zipfile module provide the 'writestr' method.
    # First argument is the name you want for the file
    # inside your zip, the second argument is the content
    # of the file, in string format. StringIO provides
    # you with the 'getvalue' method to give you the full
    # content as a string
    zip_archive.writestr('temp'+str(i)+'.txt',
                         temp[i].getvalue())

# Here you finish editing your zip. Now all the information is
# in your buff StringIO object
zip_archive.close()

# You can visualize the structure of the zip with this command
print zip_archive.printdir()

# You can also save the file to disk to check if the method works
with open('test.zip', 'w') as f:
    f.write(buff.getvalue())

Solution

  • I'm guessing that you are using Windows? Try opening the output zip file in binary mode, i.e.

    with open('test.zip', 'wb') as f:
        f.write(buff.getvalue())
    

    In text mode (the default) Python converts new lines ('\n') to the native line ending sequence, which is \r\n in Windows. That will cause the CRC to fail because the CRC is calculated using the data in the StringIO buffer, but the data is then altered (\n is converted to \r\n) when written to the text mode file.