I am trying to extract data from a nested tibble. Within the outer tibble, not all tibbles may exist or be complete. In case of an non-existing column I would like to return 0.
df <- tibble(a = tibble(iris),
b = tibble(iris[1:2]),
c = NULL)
now I'd like to extract the column 'species' from each nested tibble, where the generated column is filled with NA if no data are available. So that the result equals:
tibble(a_s = iris$Species,
b_s = NA,
c_s = NA)
Is there any way I could achieve this?
I naively tried:
transmute(df, a_s = a$species,
b_s = b$species,
c_s = c$species)
which of course only works for a_s
,
generates a warning for b_s
and throws an error for c_s
.
I have tried creating a helper function to evaluate the existence of each column, but this didn't work for nested dataframes. Any ideas on how to solve this?
UPDATE: for clarity, I always want to generate the output as specified, while tibble c may or may not be there.
Using grepl
within ifelse
to check for Species and do.call
to get the final tibble
.
library(dplyr)
do.call(tibble, sapply(c("a", "b", "c"), function(x)
ifelse(any(grepl("Species", names(df[[x]]))),
df[[x]]["Species"],
NA_character_))) %>%
rename_with(~ paste0(.x, "_s"))
# A tibble: 150 × 3
a_s b_s c_s
<fct> <chr> <chr>
1 setosa NA NA
2 setosa NA NA
3 setosa NA NA
4 setosa NA NA
5 setosa NA NA
6 setosa NA NA
7 setosa NA NA
8 setosa NA NA
9 setosa NA NA
10 setosa NA NA
# … with 140 more rows
# ℹ Use `print(n = ...)` to see more rows