Search code examples
rdata-manipulation

Conditionally Concatenating Strings in R


I have this dataset in R:

id = 1:5
col1 = c("12 ABC", "123", "AB", "123344567", "1345677.")
col2 = c("gggw", "12", "567", "abc 123", "p")
col3 = c("abw", "abi", "klo", "poy", "17df")
col4 = c("13 AB", "344", "Huh8", "98", "b")
    
my_data = data.frame(id, col1, col2, col3, col4)

 id      col1    col2 col3  col4
1  1    12 ABC    gggw  abw 13 AB
2  2       123      12  abi   344
3  3        AB     567  klo  Huh8
4  4 123344567 abc 123  poy    98
5  5  1345677.       p 17df     b

I then used the following code to check to see if a specific cell contains AT LEAST one number:

my_data$col1_check = grepl("\\d", my_data$col1)
my_data$col2_check = grepl("\\d", my_data$col2)
my_data$col3_check = grepl("\\d", my_data$col3)
my_data$col4_check = grepl("\\d", my_data$col4)

  id      col1    col2 col3  col4 col1_check col2_check col3_check col4_check
1  1    12 ABC    gggw  abw 13 AB       TRUE      FALSE      FALSE       TRUE
2  2       123      12  abi   344       TRUE       TRUE      FALSE       TRUE
3  3        AB     567  klo  Huh8      FALSE       TRUE      FALSE       TRUE
4  4 123344567 abc 123  poy    98       TRUE       TRUE      FALSE       TRUE
5  5  1345677.       p 17df     b       TRUE      FALSE       TRUE      FALSE

What I am trying to do, is for each row : I would like to take all columns in which the value is FALSE, and paste (with a space) the contents of these columns into a single cell.

This would look something like this:

 id  new_col
1  1 gggw abw
2  2      abi
3  3   AB klo
4  4      poy
5  5      p b

I have been trying to read about "conditional concatenation" (e.g. conditional concatenation in R), but so far nothing I have read matches the problem I am working on.

Can someone please suggest what to do from here?

Thanks!


Solution

  • Here is one option in tidyverse - loop across the columns col1 to col4, get the corresponding value from the logical column by pasteing the _check on the column names (cur_column()), convert the TRUE values to NA in case_when and unite those columns to new_col

    library(stringr)
    library(dplyr)
    library(tidyr)
     my_data %>%
       transmute(id, across(col1:col4, 
        ~ case_when(!get(str_c(cur_column(), "_check"))~ .x))) %>% 
       unite(new_col, col1:col4, sep = " ", na.rm = TRUE)
    

    -output

     id  new_col
    1  1 gggw abw
    2  2      abi
    3  3   AB klo
    4  4      poy
    5  5      p b
    

    If we want to skip creating the _check, it will be easier as we can directly convert the elements that are not needed to NA and unite

    my_data %>%
       mutate(across(col1:col4,
        ~ case_when(str_detect(.x, "\\d+", negate = TRUE)  ~.x))) %>% 
       unite(new_col, col1:col4, sep = " ", na.rm = TRUE)
    

    -output

      id  new_col
    1  1 gggw abw
    2  2      abi
    3  3   AB klo
    4  4      poy
    5  5      p b
    

    Or using base R

    cbind(my_data[1], new_col = gsub("\\s{2,}", " ", 
        trimws(do.call(paste, replace(my_data[2:5], 
         as.matrix(my_data[6:9]), '')))))
    

    -output

     id  new_col
    1  1 gggw abw
    2  2      abi
    3  3   AB klo
    4  4      poy
    5  5      p b