Let´s assume that my dataframe looks like this:
I have this multi-layered json string response
in a column of the data frame and I would like to expand or parse its multiple columns in the same dataframe.
"[
{\"question\":
{\"id\":314123,
\"text\":\"What did you see in this module?\",
\"question_id\":151,
\"requirements\":
{\"answer_type\":\"text\",
\"text_required\":true}
},
\"response\":
{\"free_text\":
{\"id\":\"314123_ft\",
\"text\":\"Identifying critical knowledge\"}
}
},
{\"question\":
{\"id\":314124,
\"text\":\"What do you feel is the gap?\",
\"question_id\":152,
\"requirements\":
{\"answer_type\":\"text\",
\"text_required\":true}
},
\"response\":
{\"free_text\":
{\"id\":\"314124_ft\",
\"text\":\"Knowledge is clearly mapped out\"}
}
},
{\"question\":
{\"id\":314125,
\"text\":\"Which of the modules you need to focus on\",
\"question_id\":153,
\"requirements\":
{\"answer_type\":\"text\",
\"text_required\":true}
},
\"response\":
{\"free_text\":
{\"id\":\"3141125_ft\",
\"text\":\"preparation of key knowledge\"}
}
}
]"
I apologise if the indentation is not totally correct, as I did indent it manually from a single-line string.
Following other entries here, here, and here I have tried several methods, but none of them could expand the whole thing into new columns, getting different errors that I show here (I intentionally only chose the options with smaller number of lines in the code to keep the post small).
> fromJSON(df) %>%
+ unnest(c(response))
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘fromJSON’ for signature ‘"data.frame", "missing"’
> df <- df %>%
+ rowwise() %>%
+ do(data.frame(fromJSON(.$response, flatten = T))) %>%
+ ungroup() %>%
+ bind_cols(df %>% select(-response))
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, :
arguments imply differing number of rows: 1, 0
I would appreciate the help of any JSON expert on this.
Here is one possible approach using map
to convert your column of json strings to a list column of dataframe plus some additional wrangling using tidyr
to unnest the nested dfs:
library(tidyverse)
df <- tibble(
id = 1:2,
response = c(response, response)
)
df |>
rename(id1 = id) |>
mutate(
response = map(response, jsonlite::fromJSON)
) |>
unnest_longer(response) |>
unnest_wider(response) |>
unnest_wider(c(response, question)) |>
unnest_wider(c(requirements, free_text), names_sep = "_")
#> # A tibble: 6 × 8
#> id1 id text question_id requirements_answer_…¹ requirements_text_re…²
#> <int> <int> <chr> <int> <chr> <lgl>
#> 1 1 314123 What d… 151 text TRUE
#> 2 1 314124 What d… 152 text TRUE
#> 3 1 314125 Which … 153 text TRUE
#> 4 2 314123 What d… 151 text TRUE
#> 5 2 314124 What d… 152 text TRUE
#> 6 2 314125 Which … 153 text TRUE
#> # ℹ abbreviated names: ¹requirements_answer_type, ²requirements_text_required
#> # ℹ 2 more variables: free_text_id <chr>, free_text_text <chr>
DATA
response <- "[
{\"question\":
{\"id\":314123,
\"text\":\"What did you see in this module?\",
\"question_id\":151,
\"requirements\":
{\"answer_type\":\"text\",
\"text_required\":true}
},
\"response\":
{\"free_text\":
{\"id\":\"314123_ft\",
\"text\":\"Identifying critical knowledge\"}
}
},
{\"question\":
{\"id\":314124,
\"text\":\"What do you feel is the gap?\",
\"question_id\":152,
\"requirements\":
{\"answer_type\":\"text\",
\"text_required\":true}
},
\"response\":
{\"free_text\":
{\"id\":\"314124_ft\",
\"text\":\"Knowledge is clearly mapped out\"}
}
},
{\"question\":
{\"id\":314125,
\"text\":\"Which of the modules you need to focus on\",
\"question_id\":153,
\"requirements\":
{\"answer_type\":\"text\",
\"text_required\":true}
},
\"response\":
{\"free_text\":
{\"id\":\"3141125_ft\",
\"text\":\"preparation of key knowledge\"}
}
}
]"