Search code examples
rmlogit

Creating new variable in wide data format, R


I have transformed my data into a wide format using the mlogit.data function in order to be able to perform an mlogit multinomial logit regression in R. The data has three different "choices" and looks like this (in its wide format):

Observation  Choice  Variable A  Variable B  Variable C
     1          1         1.27       0.2         0.81        
     1          0         1.27       0.2         0.81           
     1         -1         1.27       0.2         0.81 
     2          1         0.20       0.45        0.70
     2          0         0.20       0.45        0.70      
     2         -1         0.20       0.45        0.70

However, as the variables A, B and C are linked to the different outcomes I would now like to create a new variable that looks like this:

Observation  Choice  Variable A  Variable B  Variable C  Variable D
     1          1         1.27       0.2         0.81        1.27
     1          0         1.27       0.2         0.81        0.2
     1         -1         1.27       0.2         0.81        0.81
     2          1         0.20       0.45        0.70        0.20
     2          0         0.20       0.45        0.70        0.45
     2         -1         0.20       0.45        0.70        0.70

I have tried the following code:

Variable D <- ifelse(Choice == "1", Variable A, ifelse(Choice == "-1", Variable B, Variable C))

However, the ifelse function only considers one choice from each observation, creating this:

Observation  Choice  Variable A  Variable B  Variable C  Variable D
     1          1         1.27       0.2         0.81        1.27
     1          0         1.27       0.2         0.81         -
     1         -1         1.27       0.2         0.81         -
     2          1         0.20       0.45        0.70         -
     2          0         0.20       0.45        0.70        0.2
     2         -1         0.20       0.45        0.70         -

Anyone know how to solve this?

Thanks!


Solution

  • You can create a table mapping choices to variables and then use match

    choice_map <- 
      data.frame(choice = c(1, 0, -1), var = grep('Variable[A-C]', names(df)))
    
    #   choice var
    # 1      1   3
    # 2      0   4
    # 3     -1   5
    
    
    df$VariableD <- 
      df[cbind(seq_len(nrow(df)), with(choice_map, var[match(df$Choice, choice)]))]
    
    
    df
    #   Observation Choice VariableA VariableB VariableC VariableD
    # 1           1      1      1.27      0.20      0.81      1.27
    # 2           1      0      1.27      0.20      0.81      0.20
    # 3           1     -1      1.27      0.20      0.81      0.81
    # 4           2      1      0.20      0.45      0.70      0.20
    # 5           2      0      0.20      0.45      0.70      0.45
    # 6           2     -1      0.20      0.45      0.70      0.70
    

    Data used (removed spaces in colnames)

    df <- data.table::fread('
    Observation  Choice  VariableA  VariableB  VariableC
         1          1         1.27       0.2         0.81        
         1          0         1.27       0.2         0.81           
         1         -1         1.27       0.2         0.81 
         2          1         0.20       0.45        0.70
         2          0         0.20       0.45        0.70      
         2         -1         0.20       0.45        0.70
    ', data.table = F)