Search code examples
rjsonhttr

Downloading multiple JSON files from a website folder


I'm trying to download all the files with the word 'tree' from this link.

I know how to download them individually, but I can't figure out how to download them all at once, according to that conditions (has word 'tree').


Solution

  • This is likely to be very slow(see notes):

     library(dplyr)
    library(rvest)
    my_table<-read_html("https://www1.ncdc.noaa.gov/pub/data/metadata/published/paleo/json/")
    my_table %>% 
      html_nodes(css="table") %>% 
      html_table() -> res
    json_names<-res[[1]][,2]
    json_names %>% 
      as_tibble() %>% 
      slice(3:nrow(.)) %>% 
      filter(grepl("tree",value)) %>% 
      pull(value) %>% 
      lapply(.,function(x) paste0("https://www1.ncdc.noaa.gov/pub/data/metadata/published/paleo/json/",
                                                    x)) %>% 
      unlist() -> url_list
    

    Sample results:

     lapply(url_list[1:2],jsonlite::fromJSON)
    [[1]]
    [[1]]$xmlId
    [1] "4355"
    
    [[1]]$NOAAStudyId
    [1] "2657"
    
    [[1]]$studyName
    [1] "Adams - Fernow Experimental Forest - QUPR - ITRDB WV003"
    
    [[1]]$doi
    [1] "https://doi.org/10.25921/jzj2-vy39"
    

    NOTE:

    On a *nix machine, I would use wget instead.