Search code examples
windowspython-2.7zipin-memorystringio

Python, how to create in-memory zip file whose files contained in it could have any format (.txt, .jpg, etc.)


I'm attempting to make a class which create a .zip in memory whose content could be any file with any format to use it later. I found useful code and built this class:

import zipfile
import StringIO


class InMemoryZip(object):
    def __init__(self):
        # Create the in-memory file-like object
        self.in_memory_zip = StringIO.StringIO()

    def append(self, filename_in_zip, file_contents):
        '''Appends a file with name filename_in_zip and contents of
        file_contents to the in-memory zip.'''
        # Get a handle to the in-memory zip in append mode
        zf = zipfile.ZipFile(self.in_memory_zip, "a", zipfile.ZIP_DEFLATED, False)

        # Write the file to the in-memory zip
        zf.writestr(filename_in_zip, file_contents)

        zf.close()

        return self

    def read(self):
        '''Returns a string with the contents of the in-memory zip.'''
        self.in_memory_zip.seek(0)
        return self.in_memory_zip.read()

    def writetofile(self, filename):
        '''Writes the in-memory zip to a file.'''
        f = file(filename, "w")
        f.write(self.read())
        f.close()

# Run a test
if __name__ == "__main__":
    imz = InMemoryZip()
    imz.append("samples/main.cpp", "//Hello code").append("samples/bee.jpg", open('bee.jpg', 'rb').read())
    imz.writetofile("test.zip")

It works fine if I only try to compress text files, but I get corrupted zip files with .jpg, .png,... I've looked for some examples but there isn't any similar because all I found it's almost the same I have like example1 or example2

The following code works (but not in-memory):

import zipfile
import glob, os

# open the zip file for writing, and write stuff to it

file = zipfile.ZipFile("test.zip", "w")

for name in glob.glob("samples/*"):
    file.write(name, os.path.basename(name), zipfile.ZIP_DEFLATED)

file.close()

# open the file again, to see what's in it

file = zipfile.ZipFile("test.zip", "r")
for info in file.infolist():
    print info.filename, info.date_time, info.file_size, info.compress_size

Then, should I use BytesIO for image, executables, ...? Do I have to discern files format?

Note: My OS is Windows 8.1 x64


Solution

  • Windows OS? In that case you need to change the way you open a file in your test code (note the "b"):

    f = file(filename, "wb")
    

    A zip file contains an essentially random mix of bytes. Some of those bytes are bound to be a \n eventually, and if you don't open the file in binary mode they will be converted to \r\n. That will corrupt the file.

    It's only a coincidence that it happened to work on text files, probably because they were small.