Search code examples
rdplyrcoalescedata-wrangling

Combine Columns with Similar Column Names / Similar Column Strings - Reprex


Goals: To merge multiple columns just based on the similarity of the column name.

Issues: I am dealing with a large data set where the column names are replicated and look like this: wk1.1, wk1.2, wk1.3. For each row, there will be only one value in the similar column names, and the others will be NA. Coalesce is very helpful, but becomes tedious (messes up automation) when I have to list each column name. Is there a way to coalesce based off a string of characters? For instance below, I would prefer to coalesce %in% "wk1."

library(dplyr)
wk1.1 <- c(15, 4, 1)
wk1.2 <- c(3, 20, 4)
wk1.3 <- c(1, 2, 17)

df <- data.frame(wk1.1, wk1.2, wk1.3)
df[df < 14] <- NA
df1 <- df %>%
  mutate(wk1 = coalesce(df$wk1.1, df$wk1.2, df$wk1.3))

Solution

  • We can use splice it with !!!

    library(dplyr)
    df %>%
          mutate(wk1 = coalesce(!!! .))
    #  wk1.1 wk1.2 wk1.3 wk1
    #1    15    NA    NA  15
    #2    NA    20    NA  20
    #3    NA    NA    17  17
    

    Or another option is to reduce and apply coalesce

    library(purrr)
    df %>%
       mutate(wk1 = reduce(., coalesce))