Search code examples
rftprcurllast-modified

Retrieve modified DateTime of a file from an FTP Server


Is there a way to find the modified date/time for files on an FTP server in R? I have found a great way to list all of the files that are available, but I only want to download ones that have been updated since my last check. I tried using:

info<-file.info(url)

However, it returns a pretty ugly list of nothing. My url is made up of: "ftp://username:password@FTPServer//filepath.xml"


Solution

  • Until we see the output from this particular FTP server (they are all different) for directory listings, here's a path you can follow:

    library(curl)
    library(stringr)
    

    Get the raw directory listing:

    con <- curl("ftp://ftp.FreeBSD.org/pub/FreeBSD/")
    dat <- readLines(con)
    close(con)
    dat
    
    ## [1] "-rw-rw-r--    1 ftp      ftp          4259 May 07 16:18 README.TXT" 
    ## [2] "-rw-rw-r--    1 ftp      ftp            35 Sep 09 21:00 TIMESTAMP"  
    ## [3] "drwxrwxr-x    9 ftp      ftp            11 Sep 09 21:00 development"
    ## [4] "-rw-r--r--    1 ftp      ftp          2566 Sep 09 10:00 dir.sizes"  
    ## [5] "drwxrwxr-x   28 ftp      ftp            52 Aug 23 10:44 doc"        
    ## [6] "drwxrwxr-x    5 ftp      ftp             5 Aug 05 04:16 ports"      
    ## [7] "drwxrwxr-x   10 ftp      ftp            12 Sep 09 21:00 releases"   
    

    Filter out the directories:

    no_dirs <- grep("^d", dat, value=TRUE, invert=TRUE)
    no_dirs
    
    ## [1] "-rw-rw-r--    1 ftp      ftp          4259 May 07 16:18 README.TXT"
    ## [2] "-rw-rw-r--    1 ftp      ftp            35 Sep 09 21:00 TIMESTAMP" 
    ## [3] "-rw-r--r--    1 ftp      ftp          2566 Sep 09 10:00 dir.sizes" 
    

    Extract just the timestamp and filename:

    date_and_name <- sub("^[[:alnum:][:punct:][:blank:]]{43}", "", no_dirs)
    date_ane_name
    ## [1] "May 07 16:18 README.TXT"
    ## [2] "Sep 09 21:00 TIMESTAMP" 
    ## [3] "Sep 09 10:00 dir.sizes" 
    

    Put them into a data.frame:

    do.call(rbind.data.frame, 
            lapply(str_match_all(date_and_name, "([[:alnum:] :]{12}) (.*)$"), 
                   function(x) {
                     data.frame(timestamp=x[2],
                                filename=x[3], 
                                stringsAsFactors=FALSE)
    })) -> dat
    dat
    
    ##      timestamp   filename
    ## 1 May 07 16:18 README.TXT
    ## 2 Sep 09 21:00  TIMESTAMP
    ## 3 Sep 09 10:00  dir.sizes
    

    You still need to convert the timestamp to a POSIXct but that's trivial.

    This particular example is dependent on that system's FTP directory listing response. Just change the regexes for yours.