I am practicing the DRY principle in my R code and I have reached this point where I have not managed to reduce the amount of lines of code. I see that it is very repetitive and I would like your help.
Here is a reproducible example:
library(tidyverse)
set.seed(2023)
# first, I generate the data
data <- as.data.frame(cbind(
replicate(10, sample(0:1, 7, replace = TRUE)),
replicate(10, sample(30:100, 7, replace = TRUE))
))
names(data) <- c(sprintf("var1_%02d", 1:10), sprintf("var2_%02d", 1:10))
data
# var1_01 var1_02 var1_03 var1_04 var1_05 var1_06 var1_07 var1_08 var1_09 var1_10 var2_01 var2_02 var2_03 var2_04 var2_05 var2_06 var2_07 var2_08 var2_09 var2_10
# 1 0 1 0 0 0 0 0 0 0 0 61 72 74 58 85 93 85 46 99 55
# 2 1 1 0 1 0 0 0 1 1 0 66 56 91 72 77 53 61 34 57 43
# 3 0 0 1 1 1 1 0 1 1 1 71 89 49 99 38 84 53 41 95 64
# 4 0 0 0 0 1 0 1 1 1 1 50 91 83 61 81 41 71 83 96 81
# 5 1 0 1 1 1 1 1 1 0 1 41 61 79 67 96 98 97 60 36 90
# 6 0 0 0 1 1 1 1 1 1 1 60 93 39 86 53 82 69 39 67 54
# 7 1 0 0 0 1 0 0 1 1 0 57 96 82 47 95 41 100 53 98 45
This is the code I want to reduce:
data %<>%
mutate(var3_01 = case_when(var1_01 == 1 ~ var2_01 + 0, TRUE ~ 0),
var3_02 = case_when(var1_02 == 1 ~ var2_02 + 0, TRUE ~ 0),
var3_03 = case_when(var1_03 == 1 ~ var2_03 + 0, TRUE ~ 0),
var3_04 = case_when(var1_04 == 1 ~ var2_04 + 0, TRUE ~ 0),
var3_05 = case_when(var1_05 == 1 ~ var2_05 + 0, TRUE ~ 0),
var3_06 = case_when(var1_06 == 1 ~ var2_06 + 0, TRUE ~ 0),
var3_07 = case_when(var1_07 == 1 ~ var2_07 + 0, TRUE ~ 0),
var3_08 = case_when(var1_08 == 1 ~ var2_08 + 0, TRUE ~ 0),
var3_09 = case_when(var1_09 == 1 ~ var2_09 + 0, TRUE ~ 0),
var3_10 = case_when(var1_10 == 1 ~ var2_10 + 0, TRUE ~ 0))
The goal is that if the var1_*
== 1, it takes the value of var2_*
for each row. However, I have not been able to replicate this code in a shorter version (tidyverse
or base
version doesn't matter). I tried this:
numbers <- c(paste0("0", 1:5))
data %<>%
mutate(across(starts_with("var1_"), ~ifelse(isTRUE(.x==1), .x:=data[, 6:10], 0), .names="var3_{numbers}"))
But this code does not generate the same result as the extended version. I appreciate any suggestion!
EDIT: Thank you all for your suggestions and for editing the reproducible example. I WAS ABLE TO SOLVE MY DOUBTS and I learned a lot with your answers. Best wishes to all!
Staying within tidyverse
You can use across
, using get
to use within case_when
to relieve us from repetition.
cols = names(data)[1:10]
data |>
mutate(across({cols}, \(x){
ifelse(x == 1, get(sub("var1", "var2", cur_column())), 0)
}, .names = "{sub('var1', 'var3', .col)}"))
var1_01 var1_02 var1_03 var1_04 var1_05 var1_06 var1_07 var1_08 var1_09 var1_10 var2_01 var2_02 var2_03 var2_04
1 0 0 1 1 1 0 0 1 1 1 31 74 42 60
2 0 1 0 0 1 0 1 0 1 1 92 63 57 98
3 1 1 0 1 0 0 0 1 1 0 53 89 64 42
4 0 1 0 0 0 1 0 1 1 1 55 37 41 97
5 0 0 0 0 1 1 0 0 0 1 47 87 56 60
6 0 0 1 0 1 0 0 0 0 1 99 73 79 31
7 1 0 0 1 0 0 0 1 1 0 61 44 52 90
var2_05 var2_06 var2_07 var2_08 var2_09 var2_10 var3_01 var3_02 var3_03 var3_04 var3_05 var3_06 var3_07 var3_08
1 60 55 57 67 97 40 0 0 42 60 60 0 0 67
2 97 78 74 30 90 49 0 63 0 0 97 0 74 0
3 77 43 52 84 43 78 53 89 0 42 0 0 0 84
4 95 94 65 86 32 82 0 37 0 0 0 94 0 86
5 47 65 100 70 91 40 0 0 0 0 47 65 0 0
6 93 77 92 57 76 93 0 0 79 0 93 0 0 0
7 46 100 74 35 38 56 61 0 0 90 0 0 0 35
var3_09 var3_10
1 97 40
2 90 49
3 43 0
4 32 82
5 0 40
6 0 93
7 38 0