Search code examples
rif-statementrecode

ifelse/case_when - new variable based on multiple conditions involving two variables


problem already solved by restoring data before running the code

with the help of several answered questions from this forum I created my code, however, it does not fully work as expected and I appreciate any hints and tips to solve the problem.

My goal is to create a new variable "v_edu_recoded" based on the variables "v_236", which contains respondent level of education, and v_236, which contains respondents specifications, if they chose '6' (meaning "other") in v_236. So, the new variable v_edu_recoded basically should be the other two variables merged. v_edu_recoded should be the same number as v_236. Only when v_236 is '6', then depending on the answer, it should be recoded to one of the other numbers (as most people gave an answer in "other" which is already covered by the categories of v_236).

My problem is, that in the output only the ten recoded cases (who had chosen 6 in v_236) are listed. The fist part of my condition (all the 832 cases who chose 1-5) did not work and is given as NA.

Any idea how to solve this? (I also tried it via "mutate", but the result was even worse..) Kind regards and thanks a lot for any help!

Here is my code:


dr_ma$v_edu_recoded <- with(dr_ma, ifelse(
  (v_236 == '1' & v_237 == '-99' | v_236 == '6' & v_237 == 'Schüler'), '1', ifelse( 
    (v_236 == '2' & v_237 == '-99'), '2', ifelse(
      (v_236 == '3' & v_237 == '-99'| v_236 == '6' & v_237 == 'Fachabitur'),'3', ifelse(
        (v_236 == '4' & v_237 == '-99' | v_236 == '6' & v_237 == 'Verwaltungsfachwirt'), '4', ifelse(
          (v_236 == '5' & v_237 == '-99'| v_236 == '6' & v_237 == 'Diplom'| v_236 == '6' & v_237 == 'Universität'),'5', ifelse(
            (v_236 == '6' & v_237 == 'meister'|v_236 == '6' & v_237 == 'Meister'|v_236 == '6' & v_237 == 'Fachakademie'),'6',NA
          )))))))

And here my output summary:

> summary(dr_ma$v_edu_recoded)
   Length     Class      Mode 
      842 character character 
> frq(dr_ma$v_edu_recoded)

x <character>
# total N=842  valid N=10  mean=4.60  sd=1.58

Value |   N | Raw % | Valid % | Cum. %
--------------------------------------
    1 |   1 |  0.12 |      10 |     10
    3 |   1 |  0.12 |      10 |     20
    4 |   1 |  0.12 |      10 |     30
    5 |   4 |  0.48 |      40 |     70
    6 |   3 |  0.36 |      30 |    100
 <NA> | 832 | 98.81 |    <NA> |   <NA>

@CPak @caldwellst thank you for that super quick reply! I tried out the case_when, however, I got the same result, probably my conditions are not set right, but I can't find whats wrong

dr_ma$v_edu_recoded3 <- case_when (dr_ma$v_236 == 1 & dr_ma$v_237 == -99 | dr_ma$v_236 == 6 & dr_ma$v_237 == 'Schüler' ~1, 
                                   dr_ma$v_236 == 2 & dr_ma$v_237 == -99 ~ 2,
                                   dr_ma$v_236 == 3 & dr_ma$v_237 == -99| dr_ma$v_236 == 6 & dr_ma$v_237 == 'Fachabitur' ~3,
                                   dr_ma$v_236 == 4 & dr_ma$v_237 == -99 | dr_ma$v_236 == 6 & dr_ma$v_237 == 'Verwaltungsfachwirt' ~ 4,
                                   dr_ma$v_236 == 5 & dr_ma$v_237 == -99| dr_ma$v_236 == 6 & dr_ma$v_237 == 'Diplom'| dr_ma$v_236 == '6' & dr_ma$v_237 == 'Universität' ~5,
                                   dr_ma$v_236 == 6 & dr_ma$v_237 == 'meister'|dr_ma$v_236 == 6 & dr_ma$v_237 == 'Meister'|dr_ma$v_236 == '6' & dr_ma$v_237 == 'Fachakademie' ~6,TRUE~NA_real_)
summary(dr_ma$v_edu_recoded3)
frq(dr_ma$v_edu_recoded3)
> summary(dr_ma$v_edu_recoded3)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
   1.00    4.25    5.00    4.60    5.75    6.00     832 
> frq(dr_ma$v_edu_recoded3)

x <numeric>
# total N=842  valid N=10  mean=4.60  sd=1.58

Value |   N | Raw % | Valid % | Cum. %
--------------------------------------
    1 |   1 |  0.12 |      10 |     10
    3 |   1 |  0.12 |      10 |     20
    4 |   1 |  0.12 |      10 |     30
    5 |   4 |  0.48 |      40 |     70
    6 |   3 |  0.36 |      30 |    100
 <NA> | 832 | 98.81 |    <NA> |   <NA>

Solution

  • The problem was solved by restoring the data before running the code again. When running

    (dput(head(dr_ma, 10))
    

    as proposed by @CPak, it was found that the original data had been messed up by the many previous trials of recoding, and setting it back to the initial state was the solution.