Search code examples
rjsonjsonlite

How to transform a large JSON file to a clean dataframe


I would like to store into a csv file the list of all public funded projects in France, which are listed in the website below:

https://aides-territoires.beta.gouv.fr/aides/?integration=&targeted_audiences=&perimeter=&text=&apply_before=&is_charged=all&action=search-filter&page=1

I used the websste API to get the JSON file containing all the projects, with the following command (using "jsonlite" package):

my_url <- "https://aides-territoires.beta.gouv.fr/api/aids/all/"

results <- 
  httr::content(
    httr::GET(my_url),
    as="text",  
    httr::content_type_json(),  
    encoding= "UTF-8"    
  )

The problem is after... I am totally beginner with JSON files manipulation, and I do not manage to transpose the information which is contained in "results" to a data frame, with column names corresponding to each project ("id","slug","url","name",etc.). Some project items are lists, others are character vectors, etc.

I tried some commands I found such as below:

df <- data.frame(
  lapply(c("id","slug","url","name","name_initial","short_title","financers",
           "instructors","programs","description","eligibility","perimeter",
           "mobilization_steps","origin_url","is_call_for_project",
           "application_url","is_charged",
           "destinations","start_date","predeposit_date","submission_deadline",
           "subvention_rate_lower_bound","subvention_rate_upper_bound",
           "loan_amount","recoverable_advance_amount","contact","recurrence",
           "project_examples","import_data_url","import_data_mention",
           "import_share_licence","date_created","date_updated"), 
         function(x){fromJSON(results,flatten = TRUE)$results[[x]]})
)

But I get the message below:

Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 1, 2, 0, 3, 4, 11, 7, 5, 15


Solution

  • With httr2 package you can do:

    library(tidyverse)
    library(httr2)
    
    "https://aides-territoires.beta.gouv.fr/api/aids/all/" %>% 
      request() %>% 
      req_perform() %>% 
      resp_body_json(simplifyVector = TRUE) %>% # SimpplifyVector is the real hero 
      pluck("results") %>% # Grab the results list
      as_tibble() # Create a tibble
    
    # A tibble: 3,282 × 31
           id slug            url   name  short_title financers instructors programs
        <int> <chr>           <chr> <chr> <chr>       <list>    <list>      <list>  
     1  70202 2d94-se-former… /aid… Se f… ""          <chr [1]> <chr [0]>   <chr>   
     2   8075 ae3b-etude-reh… /aid… Mett… ""          <chr [1]> <chr [0]>   <chr>   
     3 117392 c650-preserver… /aid… Prés… ""          <chr [1]> <chr [0]>   <chr>   
     4 117180 e8e0-soutenir-… /aid… Sout… ""          <chr [1]> <chr [0]>   <chr>   
     5  78196 ef73-soutenir-… /aid… Sout… ""          <chr [1]> <chr [0]>   <chr>   
     6  22827 c372-aide-a-la… /aid… Fina… ""          <chr [1]> <chr [0]>   <chr>   
     7  90762 6564-creer-une… /aid… Crée… ""          <chr [2]> <chr [0]>   <chr>   
     8  30762 9e6a-soutien-d… /aid… Sout… ""          <chr [1]> <chr [0]>   <chr>   
     9  90797 f299-activites… /aid… Sout… ""          <chr [1]> <chr [0]>   <chr>   
    10  94752 46de-accelerer… /aid… Déve… ""          <chr [2]> <chr [0]>   <chr>   
    # … with 3,272 more rows, and 23 more variables: description <chr>,
    #   eligibility <chr>, perimeter <chr>, mobilization_steps <list>,
    #   origin_url <chr>, categories <list>, is_call_for_project <lgl>,
    #   application_url <chr>, targeted_audiences <list>, aid_types <list>,
    #   destinations <list>, start_date <chr>, predeposit_date <chr>,
    #   submission_deadline <chr>, subvention_rate_lower_bound <int>,
    #   subvention_rate_upper_bound <int>, loan_amount <int>, …