Search code examples
pythonzippython-zipfile

How do I extract a file with the python zipfile library while changing it's name


This is motivated by pathfile issues (unfortunately this doesn't seem to be true in my case).

I have a zipfile that I am trying to extract with python. The zipfile appears to have been created on windows. The code I have to extract the files from the zipfile is like this:

def unzip_file(zipfile_path):
    z = zipfile.ZipFile(zipfile_path)
    # get pathname without extension
    directory = os.path.splitext(zipfile_path)[0]
    print directory
    if not os.path.exists(directory):
        os.makedirs(directory)
    #this line doesn't work. tries to extract "Foobar\\baz.quux" to directory and complains that the directory doesn't exist
    # z.extractall(directory)
    for name in z.namelist():
        # actual dirname we want is this
        # (dirname, filename) = os.path.split(name)
        # I've tried to be cross-platform, (see above) but aparently zipfiles save filenames as
        # Foobar\filename.log so I need this for cygwin
        dir_and_filename = name.split('\\')
        if len(dir_and_filename) >1:
            dirname = dir_and_filename[0:-1]
            filename = dir_and_filename[-1]
        else:
            dirname = ['']
            filename = dir_and_filename[0]

        out_dir = os.path.join(directory, *dirname)
        print "Decompressing " + name + " on " + out_dir
        if not os.path.exists(out_dir):
            os.makedirs(out_dir)
        z.extract(name, out_dir)
    return directory

while this seems overly complicated this is to try and workaround some bugs I've found. One member of the zipfile is Foobar\\filename.log. on trying to extract that it complains that the directory doesn't exist. I need a way to use a method like so:

zipfile.extract_to(member_name, directory_name, file_name_to_write)

where member name is the name of the member to be read (in this example Foobar\\filename.log), directory_name is the name of the directory that we want to write to, and file_name_to_write is the name of the file that we want to write (in this case it would be filename.log). This does not seem to be supported. Does anyone have any other ideas on how to get a cross platform implementation of extracting this kind of zip archive that has nested expressions?

According to this answer the zipfile I have may not meet the zipfile specifications (it says that:

All slashes MUST be forward slashes '/' as opposed to backwards slashes '\' for compatibility with Amiga and UNIX file systems etc.

in the zipfile specification 4.4.17) How do I solve this problem?


Solution

  • I solved this by simply shelling out to unzip. We need to check for an exit code of 0 or 1 as an exit code of one is returned by the unzip command (due to the malformed zipfile, the message given is something like warning: zipfile appears to contain backslashes as path separators.

    #!/bin/bash
    unzip $1 -d $2
    exit_code=$?
    # we catch exit_codes < 2 as the zipfiles are malformed
    if [ $exit_code -lt 2 ]
    then exit 0
    else exit $exit_code
    fi