Search code examples
rapickan

Querying API for every unique resource in a package using R


I'm writing a script to download all unique excel files in a package from an open data site using CKAN. I'm currently trying to write a function that cycles through a list of the unique dataset IDs, gets the URL for each ID and downloads the dataset to my computer. I'm however having trouble writing the function.

So far the function only gives me the first dataset in the package, but there are 3 more that need to be downloaded.

library(tidyverse)
library(ckanr)
library(jsonlite)
library(readxl)
library(curl)
library(janitor)
library(mlr3misc)


url <- "http://osmdatacatalog.alberta.ca/" # set url to access data
ckanr_setup(url = url)

x <- resource_search(q = "name:wetland monitoring benthic invertebrate community", limit = 10) # get id of data
id <- ids(x$results)

id_download <- function(id) {
  for (i in id)
    a <- resource_show(i)
    b <- a$url
    destfile <- paste("C:/Users/Name/Documents/Database_updates/OSM_benthic_invertebrates/",basename(b))
    curl::curl_download(b, destfile)
}

Anyone know where I'm getting this wrong?


Solution

  • The for loop needs to have brackets following it. The stuff inside the brackets is what gets executed in a loop.

    It also looks like all the files might have the same name? If they do they might overwrite each other. Just in case it might make sense to add something to the destfile name so that you're sure all the file names will be unique. This worked for me:

    dir.create("invertebrates")
    
    
    url <- "http://osmdatacatalog.alberta.ca/" # set url to access data
    ckanr_setup(url = url)
    
    x <- resource_search(q = "name:wetland monitoring benthic invertebrate community", limit = 10) # get id of data
    id <- ids(x$results)
    
    id_download <- function(id) {
    
      for (i in id){
        a <- resource_show(i)
        b <- a$url
      
      destfile <- paste0("./invertebrates/",
                         substr(i, 1,4),
                         basename(b))
      
      curl::curl_download(b, destfile)
      
      }
    }
    
    
    id_download(id)