Search code examples
dplyrcase-when

Creating a new variable under conditions of other two variables


I'm trying to create a new variable in a dataset under some conditions of other variables. Basically, I want to simplify the information about education of parents, that is split between father and mother, and create a new one, that takes in account the highest level of education of the parents. For example, if the father education level is 1 and mother education is 0, the value for this row in the new variable would be 1.

I'm trying to use mutate() with case_when() functions, that worked in another variable, but I'm not understanding why isn't right now. When I try, it creates a column with only NA's and when I print a table from it, the result is:

< table of extent 0 >

The class of the two variables that I'm using for conditions is 'labelled' and 'factor'.

First, I tried the following command (I'm simplifying the codes):

dataset <- dataset %>% 
           mutate(NEW_EDUCATIONAL_VAR = case_when(MOTHER_EDUCATIONAL_VAR == '0' &  FATHER_EDUCATIONAL_VAR == '0' ~ '0',
                                                  MOTHER_EDUCATIONAL_VAR == '0' & FATHER_EDUCATIONAL_VAR == '1' ~ '1')

Then, I tried to consider the cases that has NA values, since there is NA in some rows:

dataset <- dataset %>% 
           mutate(NEW_EDUCATIONAL_VAR = case_when(is.na(MOTHER_EDUCATIONAL_VAR) & is.na(FATHER_EDUCATIONAL_VAR) ~ '99',
                                                  MOTHER_EDUCATIONAL_VAR == '0' & FATHER_EDUCATIONAL_VAR == '1' ~ '1')

When I used these functions for create a new one for the age of the cases, it worked.

dataset <- dataset %>% mutate(AGE_CAT = case_when(AGE >= 16 & AGE <= 18 ~ '0',
                                                   AGE >= 19 & AGE <= 24 ~ '1',
                                                   AGE >= 25 & AGE <= 29 ~ '2',
                                                   AGE >= 30 ~ '3'))

So, what am I doing wrong? Thanks a lot.


Solution

  • You can play around with the values. Hope this helps.

    #packages
    library(tidyverse)
    
    #sample data
    Mother <- c(0,0,0,1,1,NA)
    Father <- c(0,1,1,0,0,1)
    df <- data.frame(Mother, Father)
    str(df) #both Mother and Father columns are numeric
    
    #mutate + case_when
    df %>% 
      mutate(New = case_when(Mother == 0 & Father == 0 ~ 0, #condition 1
                             Mother == 0 & Father == 1 ~ 1, #condition 2
                             is.na(Mother) & Father == 1 ~ NA_real_, #condition 3
                             TRUE ~ 99)) #all other cases
    

    Output:

      Mother Father New
    1      0      0   0
    2      0      1   1
    3      0      1   1
    4      1      0  99
    5      1      0  99
    6     NA      1  NA