I have a dataframe (title: fy14y) with 100 variables (c1 - c100) containing alphanumeric codes of varying length (e.g. 1-S023; 2-Y0408)
What would be the best way to create a binary variable that detects if one or more of the codes from the following list is present in a row?
1-A400,1-A401, 1-A402, 1-A410, 1-A415, 1-A4152, 1-A4158, 1-B377, 1-P360, 1-P362, 1-P364, 1-U900
i.e. 0=if none of them appear, 1=if one or two or three etc. appear once or multiple times.
I've played around with the stringr str_detect function without much luck
thanks!
I think an intuitive way to do this is to put the data in long form, search for a match by id
and then put it back in wide form. We can do this easily with tidyr
and dplyr
.
Using the df
and codes_to_check
from the answer by Ronak Shah:
df |>
pivot_longer(-id) |>
mutate(
row_match = +(any(codes_to_check %in% value)), .by = id
) |>
pivot_wider()
# # A tibble: 4 × 4
# id row_match c1 c2
# <int> <int> <chr> <chr>
# 1 1 1 1-A400 a
# 2 2 0 a b
# 3 3 1 b 1-A401
# 4 4 0 c d
This should be faster than iterating over rows.