Search code examples
pythonftpftputil

Only one FTP query to get directory files and metadata


I'm working on a project that needs solid performance.

I need to analyze the contents of an FTP folder (and its sub-folders) with a single FTP request (to avoid making a call per file, which I could do with ftp_host.stat(file_path)).

For each folder, I need to retrieve :

  • The file/folder name (and whether it's a folder or a file)
  • File size
  • The file's last modification date

I'm working with ftputil in Python. The framework I'm working in requires this information (I can't use download_if_newer()). I tried with the code:

ftp_host._dir(folder)

This code returns almost everything I need in this line:

-rw-r--r-- 1 test ftpusers 37 Aug 13 09:37 fruits.csv

who gives me this information:

  • leading - indicates that it's a file (d for directory)
  • 37 is the size
  • and fruits.csv is the name

However, the date is incomplete: I only get Aug 13 09:37. The year is missing.

Is there a solution for retrieving the year as well?


Solution

  • The ftputil library doesn't allow you to retrieve the exact date if the year doesn't correspond to the current year.

    • _dir()
    • stat()
    • path.getmtime() are all affected by this problem.

    If you can use ftplib, you can use this code to obtain the files and their metadata:

    import ftplib
    
    session = ftplib.FTP(hostname, username, password)
    
    entries = session.mlsd("", ["Modify", "Type", "Size"])
    
    # or if you're using ftputil :
    # entries = ftp_host._session.mlsd("", ["Modify", "Type", "Size"])
    for x in entries:
        print(x)
    

    prints

    ('file.csv', {'modify': '20240813125957', 'size': '855', 'type': 'file'})
    ('folder_name', {'modify': '20240813130350', 'type': 'dir'})
    ('old_file.csv', {'modify': '20231109154538', 'size': '1747142067', 'type': 'file'})
    

    RFC 3659 section 7.2 lists the facts added here as parameters to session.mlsd