Search code examples
rautomationrefresh

How can I refresh a data source from an R package automatically? Specifically the coronavirus CRAN package data


I am working with the Johns Hopkins coronavirus R package, but I haven't yet figured out how to get it to provide me with the underlying updated data each day. I have restarted R and reloaded the package, but it seems as though the data is static from when I installed the package. It does not provide updated data each time I run it unless I reinstall the package. The data behind this package gets updated nightly on the repository. I'm trying to figure out a good way to have mine updated daily as well.

Thanks in advance for any help you can provide!


library(coronavirus) 
library(dplyr)

data("coronavirus")

summary_df <- coronavirus %>% group_by(Country.Region, type) %>%
  summarise(total_cases = sum(cases)) %>%
  arrange(-total_cases)
df <- coronavirus %>%
  group_by(Province.State,Country.Region,Lat,Long,type) %>%
  mutate(TotalCasesRegion = cumsum(cases))```

Solution

  • One option could be to grab the dataset from the package author's project on GitHub (assuming this data makes its way into the package as-is).

    download.file("https://github.com/RamiKrispin/coronavirus/raw/master/data/coronavirus.rda", "cv")
    load("cv")
    

    Seems to be the latest dataset:

    max(coronavirus$date)
    [1] "2020-03-04"
    
    nrow(coronavirus)
    [1] 2777