Search code examples
python7zippysftppy7zlib

pysftp + py7zr decompress is hanging on archive


A little context : a client is putting 7-zip archive on a remote sftp server and i process them.

My issue is that on some 7-zip file my program is hanging and the decompress function is never ending but all the files of the archive are found (present) on the local server.

I manage to get a stack trace by ctrl+c on the terminal :

[terminal screen][1] [1]: https://i.sstatic.net/O62U4.png

My code :

    def download_from_sftp(self):
    with pysftp.Connection(host=self.hostname, port=self.port, username=self.user, password=self.password, cnopts=self.cnopts) as sftp:
        self.logger.debug("Connection succesfully established ... ")

        sftp.cwd(self.path)  # Switch to a remote directory

        directory_structure = sftp.listdir_attr()

        self.logger.debug("Downloading zip files :")
        for attr in directory_structure:
            self.logger.debug(attr.filename + " " + str(attr))
            sftp.get(attr.filename, self.path_retour + attr.filename)

            with py7zr.SevenZipFile(self.path_retour + attr.filename, mode='r') as z:
                z.extractall(self.path_retour)

            os.rename(self.path_retour + attr.filename, self.path_archive_python + attr.filename)  # move zip to archive folder on local server
            sftp.remove(attr.filename)  # delete zip on remote server

This issue happens maybe 1 times for 1000 7zip archives (most of the archives are < 1mb). I tried to verify the integrity of the archive and they are valid. On my desktop py7zr is able to extract all files of the archives without crashing/hanging.

I'm thinking maybe the sftp connection is responsible for the hanging.

Thanks

--- Edit---

With MartinPrikryl feedback i did run my whole script on my local computer and it does not hang. It only hang with that particular archive with the server running the script. I noticed that the archive is significantly bigger than the other (~ 9mb). However the server has a lot of disk space (1 TB free), 4gb ram and 4 CPU so it should not be an issue.


Solution

  • I did not manage to find find why py7zr was hanging on some archives but update python 3.7.4 to python 3.8.5 solved the issue.