Search code examples
rzip

R: Can't download .zip from url


I'm trying to download the following zip from the Internet and extracting a .shp file from it: https://www.wpc.ncep.noaa.gov/archives/ero/20181227/shp_94e_2018122701.zip

And inside the .zip file, I'm trying to extract the .shp file: 94e2701.shp

(I want to do this in R so I can automate this process to download .shp files for several dates).

Based on some reading, here is the code I've tried:

shp_url = "https://www.wpc.ncep.noaa.gov/archives/ero/20181227/shp_94e_2018122701.zip"
tmp = tempfile()

download.file(shp_url,tmp,mode="wb")
# I have also tried without the "mode" argument but have gotten the same result

f_name = "94e2701.shp"
data <- sf::st_read(unz(tmp,f_name))
# Error: Cannot open "3"; The file doesn't seem to exist.
unlink(tmp)

When I go to the location of the temp file, I see it's this: "file1b9026cd6821", but it's not a .zip, so I can't extract anything from it/go inside it.

What am I doing wrong here? Any help or guidance is much appreciated! Thanks!


Solution

  • BLUF: you need more than just the .shp file. Unzip more (all) of the files and you'll get differing results.

    For each below, I'm using unzip on the command line to unzip only the files in the step. In between, I remove files not being tested. I do not believe there is a way in R to unz(..) in order to get to all files.

    1. Just 94e2701.shp: error

    2. .shp and .prj: error

    3. .shp and .dbf file: error

    4. .shp and .shx: partial success, does not fill CRS

      data <- sf::st_read("94e2701.shp")
      # Reading layer `94e2701' from data source `C:\Users\r2\AppData\Local\Temp\Rtmpqoj4GE\94e2701.shp' using driver `ESRI Shapefile'
      # Simple feature collection with 2 features and 0 fields
      # Geometry type: POLYGON
      # Dimension:     XY
      # Bounding box:  xmin: -100.4 ymin: 28.27 xmax: -91.4 ymax: 39.77
      # CRS:           NA
      
    5. .shp, .shx, and .dbf: same as 4, no CRS

    6. .shp, .shx, .dbf, and .prj: success

      data <- sf::st_read("94e2701.shp")
      # Reading layer `94e2701' from data source `C:\Users\r2\AppData\Local\Temp\Rtmpqoj4GE\94e2701.shp' using driver `ESRI Shapefile'
      # Simple feature collection with 2 features and 7 fields
      # Geometry type: POLYGON
      # Dimension:     XY
      # Bounding box:  xmin: -100.4 ymin: 28.27 xmax: -91.4 ymax: 39.77
      # Geodetic CRS:  GCS_Sphere_EMEP
      

    Incidentally, the thing that made me check this is one paragraph in ?sf::st_read:

    Note that stray files in data source directories (such as *.dbf) may lead to spurious errors that accompanying '*.shp' are missing.

    This made me wonder if the presence of other files in the directory were causing your problem.

    I do not believe there is a way in R to unz(..) in order to get to all files. If you don't want to unzip them into your current directory (just "look" at the files and discard later), then you can create a temp directory, unzip into that, and open the file from there.

    dir.create(td <- tempfile())
    unzip(tmp, exdir = td)
    data <- sf::st_read(file.path(td, f_name))
    # Reading layer `94e2701' from data source `C:\Users\r2\AppData\Local\Temp\Rtmpqoj4GE\file185581a1d4d01\94e2701.shp' using driver `ESRI Shapefile'
    # Simple feature collection with 2 features and 7 fields
    # Geometry type: POLYGON
    # Dimension:     XY
    # Bounding box:  xmin: -100.4 ymin: 28.27 xmax: -91.4 ymax: 39.77
    # Geodetic CRS:  GCS_Sphere_EMEP