Search code examples
rmergegroupingr-caretdummy-variable

Merge binary variables into grouping variables by the start of the name in R


I have a dataset like original with numeric (NP) and binary (all the rest) variables (my dataset is much larger and includes way more numeric and dummies):

NP <- c(4,6,18,1,3,12,8)
iso_mode_USA <- c(1, 0, 0, 0, 0, 1, 1)
iso_mode_CHN <- c(0, 1, 1, 0, 0, 0, 0)
iso_mode_COL <- c(0, 0, 0, 1, 1, 0, 0)
exp_sector_4 <- c(0, 1, 0, 0, 1, 0, 0)
exp_sector_5 <- c(1, 0, 1, 0, 0, 0, 0)
exp_sector_7 <- c(0, 0, 0, 1, 0, 1, 1)
original <- data.frame(NP, iso_mode_USA, iso_mode_CHN, iso_mode_COL, exp_sector_4, exp_sector_5, exp_sector_7)

I want to merge the binary variables by the start of name (e.g. all "iso_mode_" together, all "exp_sector_" together, and so on with the rest of variables I have in my dataset not presented here). In the new variables generated I want to input the value that is at the end of the colname() of the original variable.

So the new dataset looks like:

NP <- c(4,6,18,1,3,12,8)
iso_mode <- c('USA', 'CHN', 'CHN', 'COL', 'COL', 'USA', 'USA')
exp_sector <- c('5', '4', '4', '7', '4', '7', '7')
new <- data.frame(NP,iso_mode, exp_sector)

I have tried with the following link. However it does not make the trick to select the dummies by names (I have around 1700 dummies in my dataset belonging to only around 20 groups)

Any solution for dplyr() environment?

Thank you


Solution

  • Bring the data in long format, keep only rows with value 1, split the column names into two columns and arrange the data back in wide format.

    library(dplyr)
    library(tidyr)
    
    original %>%
      pivot_longer(cols = -NP) %>%
      filter(value == 1) %>%
      extract(name, c('col', 'val'), '(\\w+_\\w+)_(\\w+)') %>%
      select(-value) %>%
      pivot_wider(names_from = col, values_from = val)
    
    #     NP iso_mode exp_sector
    #  <dbl> <chr>    <chr>     
    #1     4 USA      5         
    #2     6 CHN      4         
    #3    18 CHN      5         
    #4     1 COL      7         
    #5     3 COL      4         
    #6    12 USA      7         
    #7     8 USA      7