I have been trying to learn the best way to recode variables in a column based on the condition of a name being associated with more than one race.
I have been working with a dataframe like this:
df <- data.frame('Name' = c("Jon", "Jon", "Bobby", "Sarah", "Fred"),
'Race' = c("Black", "White", "Asian", "Asian", "Black"))
What I am trying to do is recode any value that appears more than once in a group and transform it into a "multi-racial" category.
The end goal is to construct a dataframe like below:
df1 <- data.frame('Name' = c("Jon", "Bobby", "Sarah", "Fred"),
'Race' = c("Multiracial", "Asian", "Asian", "Black"))
The way I currently am doing it is by getting a list of people with multiple answers grouping race by name. Then, get a list of the names with more than one answer and for the names with more than one answer only, replace the race with "multi-racial". Code shown below:
df1 <- unique(df[, c('Name', 'Race')])
multi_answer <-
df1 %>%
dplyr::group_by(Name) %>%
dplyr::summarise(n_answers = n_distinct(Race))
multi_answer <- multi_answer[multi_answer$n_answers >1,]
df1[df1$Name %in% c(multi_answer$Name), 'Race'] <- 'multi-racial'
df1 <- unique(df1)
You can just group_by
the Name
and then summarize
the data. You just use the condition of "if there is more than one entry" (i.e., n() > 1
):
library(tidyverse)
df |>
group_by(Name)|>
summarise(Race = ifelse(n() > 1, "multi-racial", Race))
#> # A tibble: 4 x 2
#> Name Race
#> <chr> <chr>
#> 1 Bobby Asian
#> 2 Fred Black
#> 3 Jon multi-racial
#> 4 Sarah Asian