Search code examples
rcategorical-data

What is the best way to deal with categorical data in R where many response includes multiple categories?


I am working with a data set where respondents indicated their race by selecting one or more racial categories in R. In my dataset, the categories are recorded as follows: 1 = Asian, 2 = Arab, 3 = Black, 4 = Latino, 5 = Native American, 6 = Pacific Islander / Native Hawaiian, 7 = White, 8 = Other, 9 = Prefer not to answer. Many respondents indicated multiple categories. For example:

race <- c('1', '2', '5', '1,5')

I am trying to change the numbers to the actual name of each racial group using dplyr's recode function.

race <- dplyr::recode(race, '1'='Asian', '2'='Arab', '3'='Black', '4'='Latinx','5'='American Indian', '6'='Pacific Islander/Native Hawaiian','7'='White', '8'='Other/None', '9'='Prefer Not to Answer')

However, I don't know how to deal with respondents who put more than one racial group. For example, with the code I used, a person who selected Asian (1) and White (5) would show as "1,5" instead of "Asian,White". My output looks like this.

[1] "Asian"           "Arab"            "American Indian" "1,5"   

My end goal is to change all the numerical categorical variables to the actual name of each group to make my output easier to read. I want it to look like this.

[1] "Asian"           "Arab"            "American Indian" "Asian,White"  

What is the best way to do this?


Solution

  • race_text <- stringr::str_replace_all(race, c('1'='Asian', '2'='Arab', 
                                          '3'='Black', '4'='Latinx', '5'='American Indian', 
                                          '6'='Pacific Islander/Native Hawaiian',
                                          '7'='White', '8'='Other/None', '9'='Prefer Not to Answer'))
    
    > race_text
    [1] "Asian"                 "Arab"                  "American Indian"       "Asian,American Indian"