Search code examples
runzip

untar through a list of .gz


in R, I want to download and untar all the .gz files from every directory from this site: ftp://ftp.dwd.de/pub/data/gpcc/GPCC_DI/

I am having difficulty with this: I put all ~60 of the .gzs from here ftp://ftp.dwd.de/pub/data/gpcc/GPCC_DI/ in a list in R, and would like to untar each into a directory, but for the life of me cant figure it out.

R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"

> typeof(list_gz)
[1] "list"
> head(list_gz)
[[1]]
[1] "ftp://ftp.dwd.de/pub/data/gpcc/GPCC_DI//2014/GPCC_DI_201401.nc.gz"

[[2]]
[1] "ftp://ftp.dwd.de/pub/data/gpcc/GPCC_DI//2014/GPCC_DI_201402.nc.gz"

> sapply(list_gz, function(i) getURL(untar(i)))
gzip: can't stat: ftp://ftp.dwd.de/pub/data/gpcc/GPCC_DI//2014/GPCC_DI_201401.nc.gz (ftp://ftp.dwd.de/pub/data/gpcc/GPCC_DI//2014/GPCC_DI_201401.nc.gz.gz): No such file or directory
 Show Traceback

 Rerun with Debug
 Error in function (type, msg, asError = TRUE)  : 
  Failed to connect to 0 port 80: Connection refused 

I am not too sure here. Maybe i should rework the first half of my code and download the ~60 .gzs instead of trying to download and untar them in a list/sapply approach. Thanks!


Solution

  • I would probably download and then unzip.

    make sure you set a working directory.

    library(curl)
    library(stringr)
    
    list_gz = list("ftp://ftp.dwd.de/pub/data/gpcc/GPCC_DI//2014/GPCC_DI_201401.nc.gz",
                   "ftp://ftp.dwd.de/pub/data/gpcc/GPCC_DI//2014/GPCC_DI_201402.nc.gz")
    
    sapply(list_gz, function(x) {
      # this will dl the file and save with same name (name after last '/')
      file_name = sub(".*//(.*)/", "", x)
      year = c(str_match(file_name, "\\d\\d\\d\\d"))
      if(!dir.exists(year)) dir.create(year)
      curl_download(x, destfile = paste0(year, "/", file_name)) 
      # insert code for unzipping here - my computer wouldnt let me untar the files
      # untar(paste0(year, "/", file_name))
    })