I am pulling indicator data from an API which gives me back data frames containing nested data frames of varying sizes to hold additional metadata. I am trying to rectangle it, but without much luck.
Each indicator is 'tagged' with a number of groups (up to 15, out of a possible 100), both with group name and group id. Each group can be further categorised under a range of group sets.
I'm looking to transform my indicators
data frame to add additional columns, with the names of the group sets, and the values being the corresponding group name for that indicator, or NA if not relevant.
The data comes from the API like this:
library(tibble)
indicators <- tibble::tibble(
name = c("Number of lettuces", "Number of oranges"),
groups = list(
data.frame(
name = c("Number"),
id = c("uid001")),
data.frame(
name = c("Oranges", "Citrus", "Number"),
id = c("uid003", "uid004", "uid001")
)
)
)
indicator_groups <- tibble::tibble(
name = c("Number", "Oranges", "Citrus"),
id = c("uid001", "uid003", "uid004"),
sets = list(
data.frame(name = c("Unit_of_measure"), id = c("gid003")),
data.frame(name = c("Fruit"), id = c("gid001")),
data.frame(name = c("Fruit", "Type"), id = c("gid001", "gid002"))
)
)
indicator_group_sets <- tibble::tibble(
name = c("Fruit", "Type", "Unit_of_measure"),
id = c("gid001", "gid002", "gid003"),
groups = list(
data.frame(id = c("uid003", "uid004")),
data.frame(id = c("uid003", "uid004")),
data.frame(id = c("uid001"))
)
)
I'm sure some combination of dplyr::left_join()
and/or tidyr::hoist()
should be able to do it, but I'm struggling to understand how. I know I should only need one of indicator_groups
or indicator_group_sets
.
In previous attempts, I've managed to get to one column per group (but not per group set), and/or one row per indicator/group combination and lots of NAs which I haven't been able to reduce.
Desired result:
indicators
#> name Fruit Type Unit_of_measure
#> 1 Number of lettuces <NA> <NA> Number
#> 2 Number of oranges Oranges Citrus Number
Can anyone give me pointers to how to sensibly go about this please?
I presume there's something more elegant, but this seems to work. It's a little tricky since each of your tables has multiple fields called "name."
library(tidyverse)
indicators_tidy = indicators |>
unnest_longer(groups) |>
unnest_wider(groups, names_sep = "_")
indicator_groups_tidy <- indicator_groups |>
unnest_longer(sets) |>
unnest_wider(sets, names_sep = "_")
indicators_tidy |>
left_join(indicator_groups_tidy,
join_by(groups_name == name, groups_id == id))
Result
# A tibble: 5 × 5
name groups_name groups_id sets_name sets_id
<chr> <chr> <chr> <chr> <chr>
1 Number of lettuces Number uid001 Unit_of_measure gid003
2 Number of oranges Oranges uid003 Fruit gid001
3 Number of oranges Citrus uid004 Fruit gid001
4 Number of oranges Citrus uid004 Type gid002
5 Number of oranges Number uid001 Unit_of_measure gid003