I contact an SFTP
server and show files based on the modified timestamp.
Currently, it is done using something like:
files = os.listdir(SFTP)
files
and get the timestamp using os.stat
.This looping in Step 2 is very costly when the SFTP is on a different server because it has to make a network call from the server to the SFTP for each and every file.
Is there a way to get both the file and modified time using os.listdir
or a similar API?
I am using a Windows back-end and the SFTP connection usually is done using the win32wnet.WNetAddConnection2 package. A generic solution would be helpful, if not a specific solution should be fine too.
If you're using Windows, you've got a lot to gain to use os.scandir()
(python 3.5+) or the backport scandir
module: scandir.scandir()
That's because on Windows (as opposed to Linux/Unix), os.listdir()
already performs a file stat behind the scenes but the result is discarded except for the name. Which forces you to perform another stat
call.
scandir
returns a list of directory entries, not names. On windows, the size/object type fields are already filled, so when you perform a stat
on the entry (as shown in the example below), it's at zero cost:
(taken from https://www.python.org/dev/peps/pep-0471/)
def get_tree_size(path):
"""Return total size of files in given path and subdirs."""
total = 0
for entry in os.scandir(path):
if entry.is_dir(follow_symlinks=False):
total += get_tree_size(entry.path)
else:
total += entry.stat(follow_symlinks=False).st_size
return total
so just replace your first os.listdir()
call by os.scandir()
and you'll have all the information for the same cost as a simple os.listdir()
(this is the most interesting on Windows, and a lot less on Linux. I've used it on a slow filesystem on Windows and got a 8x performance gain compared to good old os.listdir
followed by os.path.isdir
in my case)