Search code examples
rvariablesloopsif-statementr-factor

Creating factor variables from levels of other factor variables with if statement


I need to produce some new factor variables in my dataset which contain information from existing factor variables.

In the first case I need to produce a binary NewVariable based on whether certain values occur in a specific variable which has more than 100 levels. I use the revalue() from the plyr package Namely,

NewVar <- if(OldVar1=="helen" | OldVar1=="greg") 
             {NewVar <-revalue(OldVar1, c("helen"="participant", "greg"="participant"))}
          else {NewVar=="nonparticipant"}

I actually want to collapse specific levels into a specific level from the new variable. As you can imagine the above code does not work but I cannot figure out why.

In the second case I need to combine information from three existing factor variables (OldVar1, OldVar2, OldVar3) in order to fill in the levels of a multi-categorical NewVariable, I run this code,

NewVariable="OptionA" <- if(OldVar1=="a" & OldVar2=="b" & OldVar3=="c")

I get an error "Error: unexpected '=' in "OldVar=" the same occurs when I remove one of the = in the OldVar1=="a"

Is it possible to create a factor NewVariable with its levels and labels without filling them with the string values in advance? I was not able to find something on that, the tutorials I see have produced their data and they just have to label the existing values.

Also, I would like to give values to the rest of my cases who either belong to OptionA, OptionB, OptionC, etc, will this be possible setting a different if-statement for each one of them as the following?

NewVariable="OptionA" <- if(OldVar1=="a" & OldVar2=="b" & OldVar3=="c")
NewVariable="OptionB" <- if(OldVar1=="a" & OldVar2=="d" & OldVar3=="e")

=== EDIT ===

For the second "challenge" I followed the code suggested by DWin I produced an interaction of my three variables that I have in the if(...) above and set inside c() only the values that I needed, for example

OldVar.ALL.interactions <- with(data, interaction(OldVar1, OldVar2, OldVar3)
levels(OldVar.ALL.interactions) # search for the levels that we need to include 
# in the NewVar
# below I follow DWin's code
NewVar <- factor(rep(NA, length(AnotherVarOfTheDataset) ),
                     levels=c("OptionA", "OptionB", ...))
NewVar[OldVar.ALL.interactions %in% c("...interaction.of.Old.Variables...")] <- "OptionA"
# the same as in OptionA for the rest of the levels
# the ** NewVar[ is.na(NewVar) ]  <- "nonparticipant" ** of DWin's code is not needed 

Is there any other way to solve this issue without using the interaction between the Old factor variables?


Solution

  • I'd probably start out with an empty factor variable (assuming that you wanted to have a factor as was implied by the subject line):

    NewVar <- factor(rep(NA, length(OldVar) ), 
                     levels=c("participant", "nonparticipant") )   
    NewVar[ OldVar %in% c("a", "b", "c")] <- "participant"
    NewVar[ is.na(NewVar) ]             <- "nonparticipant"
    

    If you don't mind having a character vector than somethingalong these lines:

     y <- vector("character",length(x))
     y[ x %in% c("a","c")] <- "p"
     y[ !x %in% c("a","c")] <- "np"
     y
    #[1] "p" "np"  "p"