Search code examples
rdataframetidyrtidyverse

R - Unlist Data_frame column of lists in tidy manner


I have data in a data frame where one column is a list. This is an example:

rand_lets <- function(){
  sample(letters[1:26], runif(sample(1:10, 1), min=5, max=12))
}

example_data <- data.frame(ID = seq(1:5),
                           location = LETTERS[1:5],
                           observations = I(list(rand_lets(),
                                                 rand_lets(),
                                                 rand_lets(),
                                                 rand_lets(),
                                                 rand_lets())))

I am looking for an elegant tidyverse approach to unlist the list column so that each element in the list is separated into a new column. For example the first row would look like this:

ID location observations  observations.1  observations.3  observations.3  observations.4  observations.5  observations.6  observations.7  observations.8  observations.9  
1        A  "y"           "b"             "m"             "u"             "x"             "j"             "t"             "i"             "v"             "w"

Of course the lists entries may be different lengths so empty cells should be NA.

How could this be done?


Solution

  • If you want to keep your data in "long" format, you can do:

    example_data %>% unnest(observations) 
    
       ID location observations
    1   1        A            e
    2   1        A            x
    3   1        A            w
    ...
    44  5        E            u
    45  5        E            o
    46  5        E            z
    

    To spread the data to "wide" format, as in your example, you can do:

    library(stringr)
    
    example_data %>% unnest(observations) %>%
      group_by(location) %>%
      mutate(counter=paste0("Obs_", str_pad(1:n(),2,"left","0"))) %>%
      spread(counter, observations)
    
         ID location Obs_01 Obs_02 Obs_03 Obs_04 Obs_05 Obs_06 Obs_07 Obs_08 Obs_09 Obs_10 Obs_11
    * <int>   <fctr>  <chr>  <chr>  <chr>  <chr>  <chr>  <chr>  <chr>  <chr>  <chr>  <chr>  <chr>
    1     1        A      e      x      w      c      s      j      k      t      z   <NA>   <NA>
    2     2        B      k      u      d      h      z      x   <NA>   <NA>   <NA>   <NA>   <NA>
    3     3        C      v      z      m      o      s      f      n      c      r      u      b
    4     4        D      z      i      m      s      a      v      n      r      e      t      x
    5     5        E      f      b      g      h      a      d      u      o      z   <NA>   <NA>