Search code examples
rreshapemlogitvarying

choice experiment data: mlogit exercise 3 "error in reshapelong.... 'varying arguments must be same length'


Following Exercise 3 of the mlogit package https://cran.r-project.org/web/packages/mlogit/vignettes/e3mxlogit.html, but attempting to use my own data (see below)

structure(list(Choice.Set = c(4L, 5L, 7L, 8L, 10L, 12L), Alternative = c(2L, 
1L, 1L, 2L, 2L, 2L), respondent = c(1L, 1L, 1L, 1L, 1L, 1L), 
    code = c(7L, 9L, 13L, 15L, 19L, 23L), Choice = c(1L, 1L, 
    1L, 1L, 1L, 1L), price1 = c(0L, 0L, 1L, 1L, 0L, 0L), price2 = c(0L, 
    1L, 0L, 0L, 1L, 1L), price3 = c(0L, 0L, 0L, 0L, 0L, 0L), 
    price4 = c(1L, 0L, 0L, 0L, 0L, 0L), price5 = c(0L, 0L, 0L, 
    0L, 0L, 0L), zone1 = c(0L, 0L, 0L, 1L, 1L, 1L), zone2 = c(0L, 
    0L, 0L, 0L, 0L, 0L), zone3 = c(1L, 0L, 1L, 0L, 0L, 0L), zone4 = c(0L, 
    1L, 0L, 0L, 0L, 0L), lic1 = c(0L, 0L, 0L, 0L, 0L, 0L), lic2 = c(1L, 
    0L, 1L, 0L, 1L, 1L), lic3 = c(0L, 1L, 0L, 1L, 0L, 0L), enf1 = c(0L, 
    0L, 1L, 0L, 1L, 0L), enf2 = c(0L, 0L, 0L, 1L, 0L, 1L), enf3 = c(1L, 
    1L, 0L, 0L, 0L, 0L), chid = 1:6), row.names = c(4L, 5L, 7L, 
8L, 10L, 12L), class = "data.frame")

I have run into an error when running the code:

dfml <- dfidx(df, idx=list(c("chid", "respondent")), 
              choice="Alternative", varying=6:20, sep ="")

"Error in reshapeLong(data, idvar = idvar, timevar = timevar, varying = varying, : 'varying' arguments must be the same length"

I have check the data and each col from 6:20 is the same length, however, some respondents chose some of the options more than the others. Can someone possibly point out where I have gone wrong? It's my first attempt at analyzing choice experiment data.


Solution

  • The error means, that your price has five options, whereas the others, zone, lic, enf have less. dfidx obviously can't handle that. You need to provide them, at least as NA columns.

    df <- transform(df, zone5=NA, lic4=NA, lic5=NA, enf4=NA, enf5=NA)
    
    library(mlogit)
    
    dfml <- dfidx(df, idx=list(c("chid","respondent")), choice="Alternative", 
                  varying=grep('^price|^zone|^lic|^enf', names(df)), sep="")
    
    dfml
    # ~~~~~~~
    #   first 10 observations out of 30 
    # ~~~~~~~
    #    Choice.Set Alternative code Choice price zone lic enf idx
    # 1           4       FALSE    7      1     0    0   0   0 1:1
    # 2           4        TRUE    7      1     0    0   1   0 1:2
    # 3           4       FALSE    7      1     0    1   0   1 1:3
    # 4           4       FALSE    7      1     1    0  NA  NA 1:4
    # 5           4       FALSE    7      1     0   NA  NA  NA 1:5
    # 6           5        TRUE    9      1     0    0   0   0 2:1
    # 7           5       FALSE    9      1     1    0   0   0 2:2
    # 8           5       FALSE    9      1     0    0   1   1 2:3
    # 9           5       FALSE    9      1     0    1  NA  NA 2:4
    # 10          5       FALSE    9      1     0   NA  NA  NA 2:5
    # 
    # ~~~ indexes ~~~~
    #    chid respondent id2
    # 1     1          1   1
    # 2     1          1   2
    # 3     1          1   3
    # 4     1          1   4
    # 5     1          1   5
    # 6     2          1   1
    # 7     2          1   2
    # 8     2          1   3
    # 9     2          1   4
    # 10    2          1   5
    # indexes:  1, 1, 2 
    

    I use grep here to identify the varying= columns. Get rid of the habit of lazily specifying variables as numbers; it's dangerous since order might change easily with small changes in the script.