Search code examples
rstringdataframedata-manipulation

How to create a variable with other dataset variables as its levels


I have a dataset, where a couple of variables are dichotomised as yes/no.

> df[1:20,]
# A tibble: 20 × 2
   black white
   <fct> <fct>
 1 No    Yes  
 2 No    Yes  
 3 No    Yes  
 4 No    Yes  
 5 No    Yes  
 6 No    Yes  
 7 No    Yes  
 8 No    Yes  
 9 No    Yes  
10 No    Yes  
11 No    Yes  
12 No    Yes  
13 No    Yes  
14 No    Yes  
15 No    Yes  
16 Yes   No   
17 No    Yes  
18 No    Yes  
19 No    Yes  
20 Yes   No 

This creates a lot of variables (my real data has more than one option for race) and doesn’t look very neat as it means a lot of variables unnecessarily. I want to create a new variable (e.g 'race'), where the now individual variables 'black', 'white' etc are levels to that variable. Like in this example

> df2[1:20,]
# A tibble: 20 × 1
   race 
   <fct>
 1 White
 2 White
 3 White
 4 White
 5 White
 6 White
 7 White
 8 White
 9 White
10 White
11 White
12 White
13 White
14 White
15 White
16 Black
17 White
18 White
19 White
20 Black

How would I go about this?


Solution

  • Here is a solution that works with multiracial cases.

    library(tidyverse)
    
    # Sample data with multiracial case
    df <- structure(list(respondent = 1:20, asian = c("No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "Yes"), black = c("No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "Yes", "No", "No", "No", "Yes"), white = c("Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "No", "Yes", "Yes", "Yes", "No")), row.names = c(NA, -20L), class = c("tbl_df", "tbl", "data.frame"))
    
    df %>%
      select(asian:white) %>%
      `==`("Yes") %>%
      apply(1, 
            \(.row) colnames(.)[.row] %>%
              str_c(collapse = "-")) 
    #>  [1] "white"       "white"       "white"       "white"       "white"      
    #>  [6] "white"       "white"       "white"       "white"       "white"      
    #> [11] "white"       "white"       "white"       "white"       "white"      
    #> [16] "black"       "white"       "white"       "white"       "asian-black"
    

    Created on 2022-04-04 by the reprex package (v2.0.1)