detecting the presence of a set of alphanumeric codes in a data frame

I have a dataframe (title: fy14y) with 100 variables (c1 - c100) containing alphanumeric codes of varying length (e.g. 1-S023; 2-Y0408)

What would be the best way to create a binary variable that detects if one or more of the codes from the following list is present in a row?

1-A400,1-A401, 1-A402, 1-A410, 1-A415, 1-A4152, 1-A4158, 1-B377, 1-P360, 1-P362, 1-P364, 1-U900

i.e. 0=if none of them appear, 1=if one or two or three etc. appear once or multiple times.

I've played around with the stringr str_detect function without much luck

thanks!

Solution

I think an intuitive way to do this is to put the data in long form, search for a match by id and then put it back in wide form. We can do this easily with tidyr and dplyr.

Using the df and codes_to_check from the answer by Ronak Shah:

df  |>
    pivot_longer(-id) |>
    mutate(
        row_match = +(any(codes_to_check %in% value)), .by = id
    ) |>
    pivot_wider()

# # A tibble: 4 × 4
#      id row_match c1     c2    
#   <int>     <int> <chr>  <chr> 
# 1     1         1 1-A400 a     
# 2     2         0 a      b     
# 3     3         1 b      1-A401
# 4     4         0 c      d

This should be faster than iterating over rows.