Search code examples
pythonpython-zipfilepath-separator

Problem with the python zipfile library if you share a file between linux and windows


The zipfile module is very interesting to manage .zip files with python.

However if the .zip file has been created on a linux system or macos the separator is of course '/' and if we try to work with this file on a Windows system there can be a problem because the separator is '\'. So, for example, if we try to determine the directory root compressed in the .zip file we can think to something like:

from zipfile import ZipFile, is_zipfile
import os

if is_zipfile(filename):

    with ZipFile(filename, 'r') as zip_ref:
        packages_name = [member.split(os.sep)[0] for member in zip_ref.namelist()
                         if (len(member.split(os.sep)) == 2 and not
                                                       member.split(os.sep)[-1])]

But in this case, we always get packet_name = [] because os.sep is "\" whereas since the compression was done on a linux system the paths are rather 'foo1/foo2'.

In order to manage all cases (compression on a linux system and use on a Windows system or the opposite), I want to use:

from zipfile import ZipFile, is_zipfile
import os

if is_zipfile(filename):

    with ZipFile(filename, 'r') as zip_ref:

        if all([True if '/' in el else
                False for el in zip_ref.namelist()]):
            packages_name = [member.split('/')[0] for member in zip_ref.namelist()
                             if (len(member.split('/')) == 2 and not
                                                       member.split('/')[-1])]

        else:
            packages_name = [member.split('\\')[0] for member in zip_ref.namelist()
                             if (len(member.split('\\')) == 2 and not
                                                           member.split('\\')[-1])]

What do you think of this? Is there a more direct or more pythonic way to do the job?


Solution

  • Thanks to @snakecharmerb answer and to the reading of the link he proposed, I have just understood. Thank you @snakecharmerb for showing me the way ... In fact, indeed as described in the link proposed, internally zipfile uses only '/' and this independently of the OS used. As I like to see things concretely I just did this little test:

    • On a Windows OS I created with the usual means of this OS (not in command line) a file testZipWindows.zip containing this tree structure:

      • testZipWindows
        • foo1.txt
        • InFolder
          • foo2.txt
    • I did the same thing on a linux OS (and without also using a command line) for the testZipFedora.zip archive:

      • testZipFedora
        • foo1.txt
        • InFolder
          • foo2.txt

    This is the result:

    $ python3
    Python 3.7.9 (default, Aug 19 2020, 17:05:11) 
    [GCC 9.3.1 20200408 (Red Hat 9.3.1-2)] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> from zipfile import ZipFile
    >>> with ZipFile('/home/servoz/Desktop/test/testZipWindows.zip', 'r') as WinZip:
    ...  WinZip.namelist()
    ... 
    ['testZipWindows/', 'testZipWindows/foo1.txt', 'testZipWindows/InFolder/', 'testZipWindows/InFolder/foo2.txt']
    >>> with ZipFile('/home/servoz/Desktop/test/testZipFedora.zip', 'r') as fedZip:
    ...  fedZip.namelist()
    ... 
    ['testZipFedora/', 'testZipFedora/foo1.txt', 'testZipFedora/InFolder/', 'testZipFedora/InFolder/foo2.txt']
    

    So it all lights up! We must indeed use os.path.sep to work properly in multiplatform but when we deals with zipfile library it is absolutely necessary to use '/' as separator and not os.sep (or os.path.sep). That was my mistake !!!

    So the code to use in a multiplatform way for the example of my first post is just:

    from zipfile import ZipFile, is_zipfile
    import os
    
    if is_zipfile(filename):
    
        with ZipFile(filename, 'r') as zip_ref:
            packages_name = [member.split('/')[0] for member in zip_ref.namelist()
                                 if (len(member.split('/')) == 2 and not
                                                           member.split('/')[-1])]
    

    And not all the useless things I had imagined...