Search code examples
rdplyrspread

R spread columns with a specific pattern


Got a data.frame with a column like this:

Column_1
AAA
B
BBB
AAA_FACE
CCC
BBB_AAA

I want to spread the column into new columns (but not for all my unique values, because then I would get very, very much columns), but only for the values containing a specific pattern: "AAA".

After spreading the values, I want to make them binary, So ideally my new data.frame looks like this:

AAA    AAA_FACE     BBB_AAA 
 1        0           0 
 0        0           0 
 0        0           0 
 0        1           0 
 0        0           0 
 0        0           1 

I tried dplyr's spread() function. But there I got the issue that I spread the data in many, many columns (instead of only the columns containing 'AAA' pattern).


Solution

  • One option with tidyverse would be

    library(tidyverse)
    df1 %>% 
      mutate(i1 = as.integer(str_detect(Column_1, "AAA")), 
             rn = row_number()) %>%
      spread(Column_1, i1, fill = 0) %>% 
      select(matches("AAA"))
    #   AAA AAA_FACE BBB_AAA
    #1   1        0       0
    #2   0        0       0
    #3   0        0       0
    #4   0        1       0
    #5   0        0       0
    #6   0        0       1
    

    It can be made a bit more efficient by replaceing the other values to NA and then do the spread

    df1 %>%
      mutate(i1 = as.integer(str_detect(Column_1, "AAA")),
             Column_1 = replace(Column_1, !i1, NA), 
             rn = row_number()) %>% 
      spread(Column_1, i1, fill = 0) %>% 
      select(matches("AAA"))