Search code examples
rdata-manipulationspreaddcast

Manipulating a factor and category in R


So i have a data set i am trying to manipulate and i cant seem to find the right way to do this. Iv looked into using dcast and spread but not sure how to get the right manipulation.

so i have something like:

ID var1 var2 var3 category
--------------------------
1  x    x    x     a
1  x    x    x     b
1  x    x    x     b
2  y    y    y     a
2  y    y    y     b
2  y    y    y     c
3  z    z    z     b 
3  z    z    z     b
3  z    z    z     c

Id like it to look like this:

ID var1 var2 var3  a  b  c 
--------------------------------
1  x    x    x     1  1  0 
2  y    y    y     1  1  1
3  z    z    z     0  1  1  

Easy example data

ID <- c(1,1,1,2,2,2,3,3,3)
var1 <- c('x','x','x','y','y','y','z','z','z')
var2 <- c('x','x','x','y','y','y','z','z','z')
var3 <- c('x','x','x','y','y','y','z','z','z')
category <- c('a','b','b','a','b','c','b','b','c')

dat <- data.frame(ID,var1,var2,var3,category)

Solution

  • ID <- c(1,1,1,2,2,2,3,3,3)
    var1 <- c("x","x","x","y","y","y","z","z","z")
    var2 <- c("x","x","x","y","y","y","z","z","z")
    var3 <- c("x","x","x","y","y","y","z","z","z")
    category <- c("a","b","b","a","b","c","b","b","c")
    
    dat <- data.frame(ID,var1,var2,var3,category)
    
    library(tidyr)
    library(dplyr)
    
    dat %>%
      distinct() %>%                   # get distinct rows
      mutate(value = 1) %>%            # create a counter
      spread(category, value, fill=0)  # reshape dataset
    
    #   ID var1 var2 var3 a b c
    # 1  1    x    x    x 1 1 0
    # 2  2    y    y    y 1 1 1
    # 3  3    z    z    z 0 1 1