Search code examples
rdplyrtidyrtibbletidytext

Expand tibble of email dataset in R


I have a massive tibble of my email data which looks like the following:

library(dplyr)

emails <- tibble(
  from = c('employee.1@xtra.co','employee.5@xtra.co','employee.1@xtra.co',
           'employee.3@xtra.co','employee.1@xtra.co'),
  to = list(
    c('employee.5@xtra.co', 'employee.3xtra.co'),
    c('employee.3@xtra.co', 'employee.1@xtra.co'),
    c('employee.2@xtra.co'),
    c('employee.1@xtra.co'),
    c('employee.3@xtra.co','employee.5@xtra.co','employee.6@xtra.co')),
  
  cc = list(
    c('employee.2xtra.co', 'employee.4xtra.co', 'employee.6xtra.co'),
    c('employee.1xtra.co', 'employee.8xtra.co', 'employee.6xtra.co'),
    NA,
    c('employee.2xtra.co', 'employee.4xtra.co'),
    c('employee.2xtra.co', 'employee.6xtra.co'))
)

emails

# A tibble: 5 x 3
  from               to        cc       
  <chr>              <list>    <list>   
1 employee.1@xtra.co <chr [2]> <chr [3]>
2 employee.5@xtra.co <chr [2]> <chr [3]>
3 employee.1@xtra.co <chr [1]> <lgl [1]>
4 employee.3@xtra.co <chr [1]> <chr [2]>
5 employee.1@xtra.co <chr [3]> <chr [2]>

I need your help to be able to expand each record for each combination. For example, what I want to achieve for row 1 is:

from                to                  cc
employee.1@xtra.co  employee.5@xtra.co  employee.2xtra.co
employee.1@xtra.co  employee.5@xtra.co  employee.4xtra.co
employee.1@xtra.co  employee.5@xtra.co  employee.6xtra.co
employee.1@xtra.co  employee.3xtra.co   employee.2xtra.co
employee.1@xtra.co  employee.3xtra.co   employee.4xtra.co
employee.1@xtra.co  employee.3xtra.co   employee.6xtra.co

Thank you very much for your time.


Solution

  • We can apply unnest twice.

    library(dplyr)
    library(tidyr)
    
    emails2 <- emails %>%
      unnest(cols = "to") %>%
      unnest(cols = "cc")
    head(emails2)
    # # A tibble: 6 x 3
    #   from               to                 cc               
    #   <chr>              <chr>              <chr>            
    # 1 employee.1@xtra.co employee.5@xtra.co employee.2xtra.co
    # 2 employee.1@xtra.co employee.5@xtra.co employee.4xtra.co
    # 3 employee.1@xtra.co employee.5@xtra.co employee.6xtra.co
    # 4 employee.1@xtra.co employee.3xtra.co  employee.2xtra.co
    # 5 employee.1@xtra.co employee.3xtra.co  employee.4xtra.co
    # 6 employee.1@xtra.co employee.3xtra.co  employee.6xtra.co
    

    If you have more than two columns to expand, below is one approach. First identify the columns that are list. Store the column names in names_target, and then use a for loop to repeatedly apply the unnest function.

    names_target <- emails %>%
      select(where(is.list)) %>%
      names()
    
    temp <- emails
    
    for (i in names_target){
      temp <- temp %>% unnest(cols = all_of(i))
    }
    
    identical(temp, emails2)
    # [1] TRUE