I need to produce some new factor variables in my dataset which contain information from existing factor variables.
In the first case I need to produce a binary NewVariable based on whether certain values occur in a specific variable which has more than 100 levels. I use the revalue() from the plyr package Namely,
NewVar <- if(OldVar1=="helen" | OldVar1=="greg")
{NewVar <-revalue(OldVar1, c("helen"="participant", "greg"="participant"))}
else {NewVar=="nonparticipant"}
I actually want to collapse specific levels into a specific level from the new variable. As you can imagine the above code does not work but I cannot figure out why.
In the second case I need to combine information from three existing factor variables (OldVar1, OldVar2, OldVar3) in order to fill in the levels of a multi-categorical NewVariable, I run this code,
NewVariable="OptionA" <- if(OldVar1=="a" & OldVar2=="b" & OldVar3=="c")
I get an error "Error: unexpected '=' in "OldVar=" the same occurs when I remove one of the = in the OldVar1=="a"
Is it possible to create a factor NewVariable with its levels and labels without filling them with the string values in advance? I was not able to find something on that, the tutorials I see have produced their data and they just have to label the existing values.
Also, I would like to give values to the rest of my cases who either belong to OptionA, OptionB, OptionC, etc, will this be possible setting a different if-statement for each one of them as the following?
NewVariable="OptionA" <- if(OldVar1=="a" & OldVar2=="b" & OldVar3=="c")
NewVariable="OptionB" <- if(OldVar1=="a" & OldVar2=="d" & OldVar3=="e")
=== EDIT ===
For the second "challenge" I followed the code suggested by DWin I produced an interaction of my three variables that I have in the if(...) above and set inside c() only the values that I needed, for example
OldVar.ALL.interactions <- with(data, interaction(OldVar1, OldVar2, OldVar3)
levels(OldVar.ALL.interactions) # search for the levels that we need to include
# in the NewVar
# below I follow DWin's code
NewVar <- factor(rep(NA, length(AnotherVarOfTheDataset) ),
levels=c("OptionA", "OptionB", ...))
NewVar[OldVar.ALL.interactions %in% c("...interaction.of.Old.Variables...")] <- "OptionA"
# the same as in OptionA for the rest of the levels
# the ** NewVar[ is.na(NewVar) ] <- "nonparticipant" ** of DWin's code is not needed
Is there any other way to solve this issue without using the interaction between the Old factor variables?
I'd probably start out with an empty factor variable (assuming that you wanted to have a factor as was implied by the subject line):
NewVar <- factor(rep(NA, length(OldVar) ),
levels=c("participant", "nonparticipant") )
NewVar[ OldVar %in% c("a", "b", "c")] <- "participant"
NewVar[ is.na(NewVar) ] <- "nonparticipant"
If you don't mind having a character vector than somethingalong these lines:
y <- vector("character",length(x))
y[ x %in% c("a","c")] <- "p"
y[ !x %in% c("a","c")] <- "np"
y
#[1] "p" "np" "p"