Search code examples
pythonftpftplib

Download files from an FTP server containing given string using Python


I'm trying to download a large number of files that all share a common string (DEM) from an FTP sever. These files are nested inside multiple directories. For example, Adair/DEM* and Adams/DEM*

The FTP sever is located here: ftp://ftp.igsb.uiowa.edu/gis_library/counties/ and requires no username and password. So, I'd like to go through each county and download the files containing the string DEM.

I've read many questions here on Stack Overflow and the documentation from Python, but cannot figure out how to use ftplib.FTP() to get into the site without a username and password (which is not required), and I can't figure out how to grep or use glob.glob inside of ftplib or urllib.

Thanks in advance for your help


Solution

  • Ok, seems to work. There may be issues if trying to download a directory, or scan a file. Exception handling may come handy to trap wrong filetypes and skip.

    glob.glob cannot work since you're on a remote filesystem, but you can use fnmatch to match the names

    Here's the code: it download all files matching *DEM* in TEMP directory, sorting by directory.

    import ftplib,sys,fnmatch,os
    
    output_root = os.getenv("TEMP")
    
    fc = ftplib.FTP("ftp.igsb.uiowa.edu")
    fc.login()
    fc.cwd("/gis_library/counties")
    
    root_dirs = fc.nlst()
    for l in root_dirs:
        sys.stderr.write(l + " ...\n")
        #print(fc.size(l))
        dir_files = fc.nlst(l)
        local_dir = os.path.join(output_root,l)
        if not os.path.exists(local_dir):
            os.mkdir(local_dir)
    
        for f in dir_files:
            if fnmatch.fnmatch(f,"*DEM*"):   # cannot use glob.glob
                sys.stderr.write("downloading "+l+"/"+f+" ...\n")
                local_filename = os.path.join(local_dir,f)
                with open(local_filename, 'wb') as fh:
                    fc.retrbinary('RETR '+ l + "/" + f, fh.write)
    
    fc.close()