Search code examples
gethttr

Download .csv file from github using HTTR GET request


I am trying to create an automatic pull in R using the GET function from the HTTR package for a csv file located on github.

Here is the table I am trying to download.

https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv

I can make the connection to the file using the following GET request:

library(httr)

x <- httr::GET("https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv")

However I am unsure how I then convert that into a dataframe similar to the table on github.

Any assistance would be much appreciated.


Solution

  • I am new to R but here is my solution.

    You need to use the raw version of the csv file from github (raw.githubusercontent.com)!

    library(httr)
    
    x <- httr::GET("https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv")
    
    # Save to file
    bin <- content(x, "raw")
    writeBin(bin, "data.csv")
    
    # Read as csv
    dat = read.csv("data.csv", header = TRUE, dec = ",")
    
    colnames(dat) = gsub("X", "", colnames(dat))
    
    # Group by country name (to sum regions)
    # Skip the four first columns containing metadata 
    countries = aggregate(dat[, 5:ncol(dat)], by=list(Country.Region=dat$Country.Region), FUN=sum)
    
    # Here is the table of the most recent total confirmed cases
    countries_total = countries[, c(1, ncol(countries))]
    

    The output graph

    How I got this to work: