I have a dataset with many columns. I am interested in the columns that contain "dx_" within the variable name. I would like to create an indicator variable that is 1 in every row where at least one of the columns' whose names contain "dx_" has a value that starts with "493". For example:
df = data.frame(var1 = c(1,2,3,4,5),var2 = c(5,4,3,2,1),dx_1 = c("493","XH","1493","4938B","LP23"),dx_2 = c("AB","0PC3","MNP","12GT","FPN2"),a_dx_3 = c("FTR","2RTN","92KS","J294","493V"))
> df
var1 var2 dx_1 dx_2 a_dx_3
1 1 5 493 AB FTR
2 2 4 XH 0PC3 2RTN
3 3 3 1493 MNP 92KS
4 4 2 4938B 12GT J294
5 5 1 LP23 FPN2 493V
I would like to create a new variable, Z
, that is 1 if any of dx_1
, dx_2
, or a_dx_3
have a value that starts with "493" in that row, or 0 otherwise. However, I need the solution to be flexible so can I don't have to specify which columns beyond saying contains("dx_")
I would like my answer to look like this:
var1 var2 dx_1 dx_2 a_dx_3 Z
1 1 5 493 AB FTR 1
2 2 4 XH 0PC3 2RTN 0
3 3 3 1493 MNP 92KS 0
4 4 2 4938B 12GT J294 1
5 5 1 LP23 FPN2 493V 1
This is my failed attempt: First I create a helper function to recognize the string:
detect_493_fn <- function(str){
ans = if_else(str_starts(str,"493") == TRUE,
1,
0)
return(ans)
}
And then use a combination of if_any, across, and contains:
ans <- df %>%
mutate(Z = case_when(
if_any(across(contains("dx_"), ~detect_493_fn(.))) ~ 1,
TRUE ~ 0))
but I get this error:
Error in `mutate()`:
! Problem while computing `Z = case_when(...)`.
Caused by error in `if_any()`:
! Must subset columns with a valid subscript vector.
x Subscript has the wrong type `tbl_df<
dx_1 : double
dx_2 : double
a_dx_3: double
>`.
i It must be numeric or character.
I would be so grateful if someone could help me. Thanks!
You can do:
library(dplyr)
library(stringr)
df %>%
mutate(Z = as.numeric(if_any(contains("dx"), str_starts, "493")))
var1 var2 dx_1 dx_2 a_dx_3 Z
1 1 5 493 AB FTR 1
2 2 4 XH 0PC3 2RTN 0
3 3 3 1493 MNP 92KS 0
4 4 2 4938B 12GT J294 1
5 5 1 LP23 FPN2 493V 1
Consider keeping your Z
variable as logical. if_any()
is an across()
variant so you use it in place of across()
not with it.