I have transformed my data into a wide format using the mlogit.data function in order to be able to perform an mlogit multinomial logit regression in R. The data has three different "choices" and looks like this (in its wide format):
Observation Choice Variable A Variable B Variable C
1 1 1.27 0.2 0.81
1 0 1.27 0.2 0.81
1 -1 1.27 0.2 0.81
2 1 0.20 0.45 0.70
2 0 0.20 0.45 0.70
2 -1 0.20 0.45 0.70
However, as the variables A, B and C are linked to the different outcomes I would now like to create a new variable that looks like this:
Observation Choice Variable A Variable B Variable C Variable D
1 1 1.27 0.2 0.81 1.27
1 0 1.27 0.2 0.81 0.2
1 -1 1.27 0.2 0.81 0.81
2 1 0.20 0.45 0.70 0.20
2 0 0.20 0.45 0.70 0.45
2 -1 0.20 0.45 0.70 0.70
I have tried the following code:
Variable D <- ifelse(Choice == "1", Variable A, ifelse(Choice == "-1", Variable B, Variable C))
However, the ifelse function only considers one choice from each observation, creating this:
Observation Choice Variable A Variable B Variable C Variable D
1 1 1.27 0.2 0.81 1.27
1 0 1.27 0.2 0.81 -
1 -1 1.27 0.2 0.81 -
2 1 0.20 0.45 0.70 -
2 0 0.20 0.45 0.70 0.2
2 -1 0.20 0.45 0.70 -
Anyone know how to solve this?
Thanks!
You can create a table mapping choices to variables and then use match
choice_map <-
data.frame(choice = c(1, 0, -1), var = grep('Variable[A-C]', names(df)))
# choice var
# 1 1 3
# 2 0 4
# 3 -1 5
df$VariableD <-
df[cbind(seq_len(nrow(df)), with(choice_map, var[match(df$Choice, choice)]))]
df
# Observation Choice VariableA VariableB VariableC VariableD
# 1 1 1 1.27 0.20 0.81 1.27
# 2 1 0 1.27 0.20 0.81 0.20
# 3 1 -1 1.27 0.20 0.81 0.81
# 4 2 1 0.20 0.45 0.70 0.20
# 5 2 0 0.20 0.45 0.70 0.45
# 6 2 -1 0.20 0.45 0.70 0.70
Data used (removed spaces in colnames)
df <- data.table::fread('
Observation Choice VariableA VariableB VariableC
1 1 1.27 0.2 0.81
1 0 1.27 0.2 0.81
1 -1 1.27 0.2 0.81
2 1 0.20 0.45 0.70
2 0 0.20 0.45 0.70
2 -1 0.20 0.45 0.70
', data.table = F)