I have a dataset, where a couple of variables are dichotomised as yes/no.
> df[1:20,]
# A tibble: 20 × 2
black white
<fct> <fct>
1 No Yes
2 No Yes
3 No Yes
4 No Yes
5 No Yes
6 No Yes
7 No Yes
8 No Yes
9 No Yes
10 No Yes
11 No Yes
12 No Yes
13 No Yes
14 No Yes
15 No Yes
16 Yes No
17 No Yes
18 No Yes
19 No Yes
20 Yes No
This creates a lot of variables (my real data has more than one option for race) and doesn’t look very neat as it means a lot of variables unnecessarily. I want to create a new variable (e.g 'race'), where the now individual variables 'black', 'white' etc are levels to that variable. Like in this example
> df2[1:20,]
# A tibble: 20 × 1
race
<fct>
1 White
2 White
3 White
4 White
5 White
6 White
7 White
8 White
9 White
10 White
11 White
12 White
13 White
14 White
15 White
16 Black
17 White
18 White
19 White
20 Black
How would I go about this?
Here is a solution that works with multiracial cases.
library(tidyverse)
# Sample data with multiracial case
df <- structure(list(respondent = 1:20, asian = c("No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "Yes"), black = c("No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "Yes", "No", "No", "No", "Yes"), white = c("Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "No", "Yes", "Yes", "Yes", "No")), row.names = c(NA, -20L), class = c("tbl_df", "tbl", "data.frame"))
df %>%
select(asian:white) %>%
`==`("Yes") %>%
apply(1,
\(.row) colnames(.)[.row] %>%
str_c(collapse = "-"))
#> [1] "white" "white" "white" "white" "white"
#> [6] "white" "white" "white" "white" "white"
#> [11] "white" "white" "white" "white" "white"
#> [16] "black" "white" "white" "white" "asian-black"
Created on 2022-04-04 by the reprex package (v2.0.1)