Search code examples
pythonwindowspython-os

Can we get the timestamp information with os.listdir in Python (like ls -l)?


I contact an SFTP server and show files based on the modified timestamp.

Currently, it is done using something like:

  1. files = os.listdir(SFTP)
  2. Loop over files and get the timestamp using os.stat.
  3. Sort the final list in Python.

This looping in Step 2 is very costly when the SFTP is on a different server because it has to make a network call from the server to the SFTP for each and every file.

Is there a way to get both the file and modified time using os.listdir or a similar API?

I am using a Windows back-end and the SFTP connection usually is done using the win32wnet.WNetAddConnection2 package. A generic solution would be helpful, if not a specific solution should be fine too.


Solution

  • If you're using Windows, you've got a lot to gain to use os.scandir() (python 3.5+) or the backport scandir module: scandir.scandir()

    That's because on Windows (as opposed to Linux/Unix), os.listdir() already performs a file stat behind the scenes but the result is discarded except for the name. Which forces you to perform another stat call.

    scandir returns a list of directory entries, not names. On windows, the size/object type fields are already filled, so when you perform a stat on the entry (as shown in the example below), it's at zero cost:

    (taken from https://www.python.org/dev/peps/pep-0471/)

    def get_tree_size(path):
        """Return total size of files in given path and subdirs."""
        total = 0
        for entry in os.scandir(path):
            if entry.is_dir(follow_symlinks=False):
                total += get_tree_size(entry.path)
            else:
                total += entry.stat(follow_symlinks=False).st_size
        return total
    

    so just replace your first os.listdir() call by os.scandir() and you'll have all the information for the same cost as a simple os.listdir()

    (this is the most interesting on Windows, and a lot less on Linux. I've used it on a slow filesystem on Windows and got a 8x performance gain compared to good old os.listdir followed by os.path.isdir in my case)