Search code examples
rlevels

Error with levels using mlogit in R


I am having some trouble with levels... Running the following:

library(mlogit)

panel.datasm = data.frame(
    cbind( 
        round(runif(100, min=1, max=6)), 
        rep(1:20,each=5), runif(100, min=0, max=1), 
        runif(100, min=0, max=6), 
        runif(100, min=2, max=6) , 
        runif(100, min=0, max=1), 
        runif(100, min=0, max=6), 
        runif(100, min=2, max=6)  ))
names(panel.datasm) = c("choice", "id", "data_1991","data_1992",
  "data_1993", "data2_1991", "data2_1992","data2_1993") 


logit.data <- mlogit.data(panel.datasm, id = "id", choice = "choice", 
    varying= 3:5, shape = "wide", sep = "_")

Keep getting the error Error in Ops.factor(data[[choice]], alt) : level sets of factors are different

I have also tried assigning levels manually:

panel.datasm$id= factor(
    panel.datasm$id, 
    levels = sort(as.character(unique(panel.datasm$id)))  )

I have tried a number of things and can't figure out what is going wrong. For comparison take a look at :

data("Electricity", package = "mlogit")
head(Electricity)
Electr <- mlogit.data(Electricity, id = "id", choice = "choice", 
    varying = 3:26, shape = "wide", sep = "")

Which as far as I can tell is identical to my data format. What's going on here? I'm at my whit's end.


Solution

  • I believe I have traced the problem. Your choice variables and your alternative variables should be the same.

    If you change your the first column of your data.frame to have values between 1991:1993 it will work.

    panel.datasm = data.frame(
        cbind( 
            sample(1991:1993, 100, replace=TRUE), 
            rep(1:20,each=5), runif(100, min=0, max=1), 
            runif(100, min=0, max=6), 
            runif(100, min=2, max=6) , 
            runif(100, min=0, max=1), 
            runif(100, min=0, max=6), 
            runif(100, min=2, max=6)  ))
    names(panel.datasm) = c("choice", "id", "data_1991","data_1992",
        "data_1993", "data2_1991", "data2_1992","data2_1993") 
    
    
    logit.data <- mlogit.data(panel.datasm, id = "id", choice = "choice", 
        varying= 3:5, shape = "wide", sep = "_") 
    

    The results:

    head(logit.data)
           choice id  alt       data     data2 chid
    1.1991  FALSE  1 1991 0.03540498 0.9726110    1
    1.1992  FALSE  1 1992 5.85285278 2.7973798    1
    1.1993   TRUE  1 1993 5.80795641 3.7360297    1
    2.1991   TRUE  1 1991 0.59255235 0.2564928    2
    2.1992  FALSE  1 1992 5.81443351 3.0820215    2
    2.1993  FALSE  1 1993 2.11699854 5.4161634    2
    

    If you now compare it with Electricity, the difference is obvious. Notice that the choices are 1:4, and each alternative ranges from 1 to 4.

    head(Electricity)
      choice id pf1 pf2 pf3 pf4 cl1 cl2 cl3 cl4 loc1 loc2 loc3 loc4 wk1 wk2 wk3 wk4
    1      4  1   7   9   0   0   5   1   0   5    0    1    0    0   1   0   0   1
    2      3  1   7   9   0   0   0   5   1   5    0    0    1    0   1   1   0   0
    3      4  1   9   7   0   0   5   1   0   0    0    0    0    1   0   1   1   0
    4      4  1   0   9   7   0   1   1   0   5    0    0    1    0   1   0   0   1
    5      1  1   0   9   0   7   0   1   0   5    1    0    0    0   0   1   0   1
    6      4  1   0   9   0   7   0   0   1   5    0    0    1    0   0   0   0   1