Search code examples
rdemographics

Recoding race variables into multiracial category by group


I have been trying to learn the best way to recode variables in a column based on the condition of a name being associated with more than one race.

I have been working with a dataframe like this:

df <- data.frame('Name' = c("Jon", "Jon", "Bobby", "Sarah", "Fred"),
                 'Race' = c("Black", "White", "Asian", "Asian", "Black"))

What I am trying to do is recode any value that appears more than once in a group and transform it into a "multi-racial" category.

The end goal is to construct a dataframe like below:

df1 <- data.frame('Name' = c("Jon", "Bobby", "Sarah", "Fred"),
                 'Race' = c("Multiracial", "Asian", "Asian", "Black"))

The way I currently am doing it is by getting a list of people with multiple answers grouping race by name. Then, get a list of the names with more than one answer and for the names with more than one answer only, replace the race with "multi-racial". Code shown below:

df1 <- unique(df[, c('Name', 'Race')])

multi_answer <-
  df1 %>%
  dplyr::group_by(Name) %>%
  dplyr::summarise(n_answers = n_distinct(Race))

multi_answer <- multi_answer[multi_answer$n_answers >1,]
df1[df1$Name %in% c(multi_answer$Name), 'Race'] <- 'multi-racial'
df1 <- unique(df1)

Solution

  • You can just group_by the Name and then summarize the data. You just use the condition of "if there is more than one entry" (i.e., n() > 1):

    library(tidyverse)
    
    df |>
      group_by(Name)|>
      summarise(Race = ifelse(n() > 1, "multi-racial", Race))
    #> # A tibble: 4 x 2
    #>   Name  Race        
    #>   <chr> <chr>       
    #> 1 Bobby Asian       
    #> 2 Fred  Black       
    #> 3 Jon   multi-racial
    #> 4 Sarah Asian