Search code examples
rdata-manipulationmeltdataformatmlogit

Convert data from wide to long format keeping all alternatives of the dv and adding a choice variable


I try to get my data to work with the -package in . I failed in converting the wide data format to a long format with the mlogit.data command, so I tried it by myself using melt.

This is what I have so far (case is a case identifier, dv will be the dependent variable, table is the data in wide format, newdata in long format):

case<-c(1,2,3)
dv<-c(1,2,3)
table<-as.data.frame(cbind(IssueID, dv))

newdata<-melt(setDT(table), id.vars = c("IssueID"), measure.vars = c("dv"))

Wide format:

   case dv
1:    1  1
2:    2  2
3:    3  3

Long format:

   IssueID variable value
1:       1       dv     1
2:       2       dv     2
3:       3       dv     3

However, to run the data with mlogit, I need a dataset that contains all values of the dependent variable for each case and a dummy that stores the information which of these alternatives was chosen by the unit of observation.

The usable data should look like this:

#case2<-c(1,1,1,2,2,2,3,3,3)
#variable2<-(c("dv","dv","dv","dv","dv","dv","dv","dv","dv"))
#value2<-c(1,2,3,1,2,3,1,2,3)
#choice2<-c(1,0,0,0,1,0,0,0,1)
#newdata2<-as.data.frame(cbind(case2, variable2,value2,choice2))

  case2 variable2 value2 choice2
1     1        dv      1       1
2     1        dv      2       0
3     1        dv      3       0
4     2        dv      1       0
5     2        dv      2       1
6     2        dv      3       0
7     3        dv      1       0
8     3        dv      2       0
9     3        dv      3       1

Do you have any suggestions for a code that does that, so that I don't have to code the choice variable manually? Thank you for your assistance.


Solution

  • Probably, you can achieve that from long format of the data using complete and fill.

    library(dplyr)
    library(tidyr)
    
    df %>%
      mutate(choice = 1) %>%
      complete(IssueID, value = seq(min(value), max(value)), 
               fill = list(choice = 0)) %>%
      fill(variable)
    
    
    #  IssueID value variable choice
    #    <int> <int> <fct>     <dbl>
    #1       1     1 dv            1
    #2       1     2 dv            0
    #3       1     3 dv            0
    #4       2     1 dv            0
    #5       2     2 dv            1
    #6       2     3 dv            0
    #7       3     1 dv            0
    #8       3     2 dv            0
    #9       3     3 dv            1
    

    data

    df <- structure(list(IssueID = 1:3, variable = structure(c(1L, 1L, 
    1L), .Label = "dv", class = "factor"), value = 1:3),
    class = "data.frame", row.names = c(NA, -3L))