Search code examples
rdplyrtidyversetidyrdata-manipulation

Is there an R Function that allows you to extract a single digital from a numeric variable REGARDLESS of its location (not just first or last digit?


I have seen a lot of posts regarding how to extract the first number in a numeric variable or the last using functions like gsub or grep, however I want to be able to extract a specific digit regardless of whether it is the first, middle, or last in a larger numeric variable. For example, I am trying to have R scan if a certain row for a column has the number 3 and if so make a new variable where 1=yes and 0=no.

Let's say I have this dataframe:

have <- as.data.frame(structure(list(Q14=structure(c(13, 3, 788, 134, 56,  3214, 1036 )))))

This is the second column that I want to generate, where a 1 for variable Q14_3 means that variable Q14 has a 3 somewhere and 0 means there is no number 3 in a specific row of Q14.

want <- as.data.frame(structure(list(Q14=structure(c(13, 3, 788, 134, 56,  3214, 1036 )),
                                      Q14_3=structure(c(1, 1, 0, 1, 0, 1, 1)))))

Thank you!


Solution

  • In base R, use grepl to make a boolean vector and + to convert this to a 1/0 variable:

    have$Q14_3 <- +grepl(3, have$Q14)
    
    #    Q14 Q14_3
    # 1   13     1
    # 2    3     1
    # 3  788     0
    # 4  134     1
    # 5   56     0
    # 6 3214     1
    # 7 1036     1
    

    Or, since it was tagged, a tidyverse approach using dplyr::mutate and stringr::str_detect:

    library(dplyr)
    library(stringr)
    
    have %>%
      mutate(Q14_3 = +str_detect(Q14, "3"))
    

    Test:

    all.equal(have, want)
    # TRUE