Search code examples
rurldownloadzip

Download zip file to R when download link ends in '/download'


My issue is similar to this post, but the solution suggestion does not appear applicable.

I have a lot of zipped data stored an online server (B2Drop), that provides a download link with the extension "/download" instead of ".zip". I have been unable to get the method described here, to work.

I have created a test download page https://b2drop.eudat.eu/s/K9sPPjWz3jxtXEq, where the download link https://b2drop.eudat.eu/s/K9sPPjWz3jxtXEq/download can be obtained by right clicking the download button. Here is my script:

temp <- tempfile()
download.file("https://b2drop.eudat.eu/s/K9sPPjWz3jxtXEq/download",temp, mode="wb")
data <- read.table(unz(temp, "Test_file1.csv"))
unlink(temp)

When I run it, I get the error:

download.file("https://b2drop.eudat.eu/s/K9sPPjWz3jxtXEq/download",temp, mode="wb") trying URL 'https://b2drop.eudat.eu/s/K9sPPjWz3jxtXEq/download' Content type 'application/zip' length 558 bytes downloaded 558 bytes

data <- read.table(unz(temp, "Test_file1.csv")) Error in open.connection(file, "rt") : cannot open the connection In addition: Warning message: In open.connection(file, "rt") : cannot locate file 'Test_file1.csv' in zip file 'C:\Users\User_name\AppData\Local\Temp\RtmpMZ6gXi\file3e881b1f230e'

which typically indicates a problem with the working directory where R is looking for the file. In this case that should be the temp wd.


Solution

  • Your internal path is wrong. You can use list=TRUE to list the files in the archive, analogous to the command-line utility's -l argument.

    unzip(temp, list=TRUE)
    #                  Name Length                Date
    # 1 Test/Test_file1.csv    256 2021-09-27 10:13:00
    # 2 Test/Test_file2.csv    286 2021-09-27 10:14:00
    

    Better than read.table, though, use read.csv since it's comma-delimited.

    data <- read.csv(unz(temp, "Test/Test_file1.csv"))
    head(data, 3)
    #   ID Variable1 Variable2 Variable Variable3
    # 1  1         f     54654       25        t1
    # 2  2         t       421       64        t2
    # 3  3         x      4521       85        t3