Search code examples
rdplyraggregate

r collapsing data from multiple columns into one


I know there are many questions on this topic so I apologize if this is a duplicate question. I'm trying to collapse multiple columns in a data set into one column:

Assuming this is the structure of the dataset I am working with,

df <- data.frame(
      cbind(
      variable_1 = c('Var1', NA, NA,'Var1'),
      variable_2 = c('Var2', 'No', NA, NA),
      variable_3 = c(NA, NA, 'Var3', NA),
      variable_4 = c(NA, 'Var4', NA, NA),
      variable_5 = c(NA, 'No', 'Var5', NA),
      variable_6 = c(NA, NA, 'Var6', NA)
      
    ))

 variable_1  variable_2  variable_3  variable_4  variable_5  variable_6 
 Var1        Var2        NA          NA          NA          NA
 NA          No          NA          Var4        No          NA
 NA          NA          Var3        NA          Var5        Var6
 Var1        NA          NA          NA          NA          NA

What I am expecting is a one column variable_7 like this

 variable_1  variable_2  variable_3  variable_4  variable_5  variable_6  variable_7
 Var1        Var2        NA          NA          NA          NA          Var1, Var2
 NA          No          NA          Var4        No          NA          Var4
 NA          NA          Var3        NA          Var5        Var6        Var3, Var5, Var6
 Var1        NA          NA          NA          NA          NA          Var1

Solution

  • df$variable_7 <- apply(df, 1, function(x) paste(x[!is.na(x) & x != "No"], collapse = ", "));
    df;
    #  variable_1 variable_2 variable_3 variable_4 variable_5 variable_6
    #1       Var1       Var2       <NA>       <NA>       <NA>       <NA>
    #2       <NA>         No       <NA>       Var4         No       <NA>
    #3       <NA>       <NA>       Var3       <NA>       Var5       Var6
    #4       Var1       <NA>       <NA>       <NA>       <NA>       <NA>
    #        variable_7
    #1       Var1, Var2
    #2             Var4
    #3 Var3, Var5, Var6
    #4             Var1
    

    Explanation: Use apply and paste(..., collapse = ", ") to concatenate all row entries (except NAs and "No"s) and store in new column variable_7.


    Sample data

    df <- data.frame(
          cbind(
          variable_1 = c('Var1', NA, NA,'Var1'),
          variable_2 = c('Var2', 'No', NA, NA),
          variable_3 = c(NA, NA, 'Var3', NA),
          variable_4 = c(NA, 'Var4', NA, NA),
          variable_5 = c(NA, 'No', 'Var5', NA),
          variable_6 = c(NA, NA, 'Var6', NA)
    
        ))