r data-manipulation melt dataformat mlogit

Convert data from wide to long format keeping all alternatives of the dv and adding a choice variable

I try to get my data to work with the mlogit-package in r. I failed in converting the wide data format to a long format with the mlogit.data command, so I tried it by myself using melt.

This is what I have so far (case is a case identifier, dv will be the dependent variable, table is the data in wide format, newdata in long format):

case<-c(1,2,3)
dv<-c(1,2,3)
table<-as.data.frame(cbind(IssueID, dv))

newdata<-melt(setDT(table), id.vars = c("IssueID"), measure.vars = c("dv"))

Wide format:

   case dv
1:    1  1
2:    2  2
3:    3  3

Long format:

   IssueID variable value
1:       1       dv     1
2:       2       dv     2
3:       3       dv     3

However, to run the data with mlogit, I need a dataset that contains all values of the dependent variable for each case and a dummy that stores the information which of these alternatives was chosen by the unit of observation.

The usable data should look like this:

#case2<-c(1,1,1,2,2,2,3,3,3)
#variable2<-(c("dv","dv","dv","dv","dv","dv","dv","dv","dv"))
#value2<-c(1,2,3,1,2,3,1,2,3)
#choice2<-c(1,0,0,0,1,0,0,0,1)
#newdata2<-as.data.frame(cbind(case2, variable2,value2,choice2))

  case2 variable2 value2 choice2
1     1        dv      1       1
2     1        dv      2       0
3     1        dv      3       0
4     2        dv      1       0
5     2        dv      2       1
6     2        dv      3       0
7     3        dv      1       0
8     3        dv      2       0
9     3        dv      3       1

Do you have any suggestions for a code that does that, so that I don't have to code the choice variable manually? Thank you for your assistance.

Solution

Probably, you can achieve that from long format of the data using complete and fill.

library(dplyr)
library(tidyr)

df %>%
  mutate(choice = 1) %>%
  complete(IssueID, value = seq(min(value), max(value)), 
           fill = list(choice = 0)) %>%
  fill(variable)


#  IssueID value variable choice
#    <int> <int> <fct>     <dbl>
#1       1     1 dv            1
#2       1     2 dv            0
#3       1     3 dv            0
#4       2     1 dv            0
#5       2     2 dv            1
#6       2     3 dv            0
#7       3     1 dv            0
#8       3     2 dv            0
#9       3     3 dv            1

data

df <- structure(list(IssueID = 1:3, variable = structure(c(1L, 1L, 
1L), .Label = "dv", class = "factor"), value = 1:3),
class = "data.frame", row.names = c(NA, -3L))