Search code examples
rweb-scrapingr-xlsxr-download.file

Download.file keeping original name or sheet name


I want to download the following link in R keeping the file name after browser download. http://www.coingecko.com/price_charts/export/1/usd.xls

Notice that the file name will be 'btc-usd-max.xls' if you download the file through browser. However, if I run the following code, the file name will be 'usd.xls'. However, the file name 'btc-usd-max.xls' conveys relevant information. Is it possible to retrieve this 'btc-usd-max' name using R?

link <- 'http://www.coingecko.com/price_charts/export/1/usd.xls'
download.file(url=link, destfile=basename(link), method='auto')

The solution provided in the following topic does not solve the problem. R download.file to a folder keeping original name

The aforementioned solution depends on content-disposition, which does not exists.

library(httr)
hd <- HEAD(link)
filename <- gsub(".*name=", "", headers(hd)$`content-disposition`)

I have tried using HEAD() command from httr package and also downloading the files using the download.file function.


Solution

  • The displayed file name is in a response header called content-disposition, but only if you get a status 200 from the server. The default user-agent in httr causes the server to return a status 403 that does not return this header, so you need to add a user agent that the server will respond to.

    Once this is done it's fairly easy to extract the file name.

    library(httr)
    
    url <- 'http://www.coingecko.com/price_charts/export/1/usd.xls'
    UA <- "Mozilla/5.0 (Windows NT 6.1; rv:85.0) Gecko/20100101 Firefox/85.0"
    res <- GET(url, user_agent(UA))
    filename <- strsplit(res$headers$`content-disposition`, '\"')[[1]][2]
    

    Now we have

    filename
    #> [1] "btc-usd-max.xls"
    

    And we can write the result like this:

    writeBin(res$content, filename)
    

    Strangely enough though, the actual binary data appears to be in csv format rather than xls format (I get a warning when I try to open it in Excel, whether downloading in R or directly in the browser). You might prefer to read it as a csv using:

    data <- read.csv(text = content(res, 'text'))
    

    Then write data to an Excel sheet named according to the filename.