Search code examples
rif-statementmultiple-conditions

How to create a column based on multiple criteria in r?


Currently I have a variable "Sex" that contains 1's and 2's for respectively men and women. I want to add random noise to this variable. Therefore I generated random numbers using a normal distribution. The next step is to determine if some of the values have to change to the other sex. I use a z-value of 2 and -2 as boundaries. So if a man (1) is assigned to a value >2, it has to change to a woman. It works also the other way around, so when a woman (2) is assigned to a random z-value of <-2, the sex variable has to change to man (1). In all the other options, the value has to remain the same value.

I thought a ifelse statement would do the trick. Unfortunately it did not work. My statement looks like:

with(Dataset18$New_sex,
     ifelse(Sex== 1 & Norm_dist_random > 2, 2 , ifelse(Sex== 1 & Norm_dist_random <= 2, 1, 
     ifelse(Sex== 2 & Norm_dist_random < -2, 1, ifelse(Sex== 2 & Norm_dist_random >= -2, 2))))
)

My data looks like:

Sex     Norm_dist_random
 1         0.622221897
 1         2.573726407
 1        -0.298095612
 1         0.717745305
 2        -2.597695772
 2         2.534427904
 2         0.089732903
 2        -0.329274570
 2        -1.173434147

In the end my data has to look like

Sex     Norm_dist_random   Sex_new
 1         0.622221897        1
 1         2.573726407        2
 1        -0.298095612        1
 1         0.717745305        1
 2        -2.597695772        1
 2         2.534427904        2
 2         0.089732903        2
 2        -0.329274570        2
 2        -1.173434147        2

Solution

  • One approach is with case_when which allows an arbitrary set of logical condition value pairs. Each argument is a left hand side that evaluates to TRUE or FALSE and a right hand side that defines the value. The two sides are separated by ~.

    Conditions are tried in order until one is TRUE and that value is assigned. I added TRUE ~ NA_real_ to catch the rows that don't fulfill any conditions.

    library(dplyr)
    Dataset18 %>% 
      mutate(Sex_new = case_when(Sex == 1 & Norm_dist_random <= 2 ~ 1,
                                 Sex == 1 & Norm_dist_random > 2 ~ 2,
                                 Sex == 2 & Norm_dist_random < -2 ~ 1,
                                 Sex == 2 & Norm_dist_random >= -2 ~ 2,
                                 TRUE ~ NA_real_))
    #  Sex Norm_dist_random Sex_new
    #1   1        0.6222219       1
    #2   1        2.5737264       2
    #3   1       -0.2980956       1
    #4   1        0.7177453       1
    #5   2       -2.5976958       1
    #6   2        2.5344279       2
    #7   2        0.0897329       2
    #8   2       -0.3292746       2
    #9   2       -1.1734341       2