Search code examples
rdataframeconcatenationnaseparator

Concatenate columns, only keeping the separator if multiple values exist (non-NA values)


Does anyone know of a method in R to concatenate n columns but only keep the separator if there is a value in that row? If you run the example below:

df <- data.frame(
                  name1 = c("Jim","Bob","Sue"),
                  name2 = c("Jane","","Bane"),
                  name3 = c('Conor',"",""),
                  name4 = c("","","Bonor")
                )

df$names <- paste(df$name1,df$name2,df$name3, sep=";")

You will see that separators are included at the end and in-between values even though the cells are empty, with the output:

df =

name1 name2 name3 name4  names
Jim   Jane  Conor        Jim;Jane;Conor;
Bob                      Bob;;;
Sue   Bane        Bonor  Sue;Bane;;Bonor

Is there any method to either not include or drop separators in the case where the cells are empty? With the desired outcome:

df =

name1 name2 name3 name4  names
Jim   Jane  Conor        Jim;Jane;Conor
Bob                      Bob
Sue   Bane        Bonor  Sue;Bane;Bonor

Solution

  • library(dplyr)
    library(tidyr)
    
    df %>% 
      mutate_all(na_if,"") %>% 
      unite("names", everything(), sep = ";", remove = F, na.rm = T)
    
    #>            names name1 name2 name3 name4
    #> 1 Jim;Jane;Conor   Jim  Jane Conor  <NA>
    #> 2            Bob   Bob  <NA>  <NA>  <NA>
    #> 3 Sue;Bane;Bonor   Sue  Bane  <NA> Bonor
    

    Update: applying this solution on specific columns.

    I am modifying akrun's answer from the comment below;

    df %>% 
      mutate(across(c("name1", "name2", "name3", "name4"), na_if, "", 
                    .names = "{.col}_changed")) %>% 
      unite(names, ends_with('_changed'), na.rm = TRUE, sep = ";")
    
    #>   name1 name2 name3 name4          names
    #> 1   Jim  Jane Conor       Jim;Jane;Conor
    #> 2   Bob                              Bob
    #> 3   Sue  Bane       Bonor Sue;Bane;Bonor