Search code examples
pythonpython-3.6sftppysftp

Optimize the performance of retreiving file sizes with pysftp


I have a requirement to get the file details for certain locations (within the system and SFTP) and get the file size for some locations on SFTP which can be achieved using the shared code.

def getFileDetails(location: str):
    filenames: list = []
    if location.find(":") != -1:
        for file in glob.glob(location):
            filenames.append(getFileNameFromFilePath(file))
    else:
        with pysftp.Connection(host=myHostname, username=myUsername, password=myPassword) as sftp:
            remote_files = [x.filename for x in sorted(sftp.listdir_attr(location), key=lambda f: f.st_mtime)]
            if location == LOCATION_SFTP_A:
              for filename in remote_files:
                filenames.append(filename)
                sftp_archive_d_size_mapping[filename] = sftp.stat(location + "/" + filename).st_size
            elif location == LOCATION_SFTP_B:
              for filename in remote_files:
                filenames.append(filename)
                sftp_archive_e_size_mapping[filename] = sftp.stat(location + "/" + filename).st_size      
            else:    
              for filename in remote_files:
                  filenames.append(filename)
            sftp.close()
    return filenames

There are more than 10000+ files in LOCATION_SFTP_A and LOCATION_SFTP_B. For each file, I need to get the file size. To get the size I am using

sftp_archive_d_size_mapping[filename] = sftp.stat(location + "/" + filename).st_size
sftp_archive_e_size_mapping[filename] = sftp.stat(location + "/" + filename).st_size
# Time Taken : 5 min+
sftp_archive_d_size_mapping[filename] = 1 #sftp.stat(location + "/" + filename).st_size
sftp_archive_e_size_mapping[filename] = 1 #sftp.stat(location + "/" + filename).st_size
# Time Taken : 20-30 s

If I comment sftp.stat(location + "/" + filename).st_size and assign static value It takes only 20-30 seconds to run the entire code. I am looking for a way How can optimize the time and get the file size details.


Solution

  • The Connection.listdir_attr already gives you the file size in SFTPAttributes.st_size.

    There's no need to call Connection.stat for each file to get the size (again).

    See also: