Search code examples
rdataframedummy-variable

Generate Dummy Variables From Data Frame


I have a data.frame with the following properties:

list1 <- c(145540,145560, 157247, 145566)
list2 <- c(166927, NA, NA, NA)
list3 <- c(145592, 145560, 145566, NA)
df <- data.frame(list1, list2, list3)

I would like to generate a dummy variables for each of the included ids. The result should look like this.

list, 145540, 145560, 145566,145592,157247,166927 (= all possible ids in the data)

list1, 1, 1, 1, 0, 1, 0

list2, 0, 0, 0, 0, 0, 1

list3, 0, 1, 1, 1, 0, 0

Any ideas how to achieve this? Thank You!


Solution

  • My answer is a little clunkier, but here it is:

    all.vals <- na.omit(unique(unlist(df)))  ## get full set of values
    

    Use a for loop for greater clarity:

    df2 <- list()
    for (i in seq_along(df))
      df2[[i]] <-
      sapply(all.vals,
             function(x) as.numeric(x %in% df[[i]]))
    names(df2) <- names(df)
    ## add labels as the first column:
    df2 <- data.frame(all.vals,df2)
    

    Result:

      all.vals list1 list2 list3
    1   145540     1     0     0
    2   145560     1     0     1
    3   157247     1     0     0
    4   145566     1     0     1
    5   166927     0     1     0
    6   145592     0     0     1