Search code examples
rtidyversenested-tibble

creating columns in nested tibble if column does not exist


I am trying to extract data from a nested tibble. Within the outer tibble, not all tibbles may exist or be complete. In case of an non-existing column I would like to return 0.

df <- tibble(a = tibble(iris),
             b = tibble(iris[1:2]),
             c = NULL)

now I'd like to extract the column 'species' from each nested tibble, where the generated column is filled with NA if no data are available. So that the result equals:

tibble(a_s = iris$Species, 
       b_s = NA, 
       c_s = NA)

Is there any way I could achieve this?

I naively tried:

transmute(df, a_s = a$species,
              b_s = b$species,
              c_s = c$species)

which of course only works for a_s, generates a warning for b_s and throws an error for c_s.

I have tried creating a helper function to evaluate the existence of each column, but this didn't work for nested dataframes. Any ideas on how to solve this?

UPDATE: for clarity, I always want to generate the output as specified, while tibble c may or may not be there.


Solution

  • Using grepl within ifelse to check for Species and do.call to get the final tibble.

    library(dplyr)
    
    do.call(tibble, sapply(c("a", "b", "c"), function(x)
      ifelse(any(grepl("Species", names(df[[x]]))), 
             df[[x]]["Species"], 
             NA_character_))) %>% 
      rename_with(~ paste0(.x, "_s"))
    # A tibble: 150 × 3
       a_s    b_s   c_s  
       <fct>  <chr> <chr>
     1 setosa NA    NA   
     2 setosa NA    NA   
     3 setosa NA    NA   
     4 setosa NA    NA   
     5 setosa NA    NA   
     6 setosa NA    NA   
     7 setosa NA    NA   
     8 setosa NA    NA   
     9 setosa NA    NA   
    10 setosa NA    NA   
    # … with 140 more rows
    # ℹ Use `print(n = ...)` to see more rows