Search code examples
rdplyrreshape2

Reshape binary data group by column and count


I want to reshape a data.frame to a matrix in the following format:

Input:

   Typ1 Typ2 Maths Science English History
    1    1     1     1       1       1       
    0    1     0     1       0       0       
    1    0     1     0       0       0       

Output:

         Maths Science English History
   Typ1    2     2       1       1              
   Typ2    1     2       1       1              

And 2nd:

         Maths Science English History
   Typ1    1     1       1       1              
   Typ2    1     1       1       1              
   Typ2    0     1       0       0      
   Typ1    1     0       0       0

Solution

  • For the first version, you can do:

    `row.names<-`(rbind(sapply(df[-c(1:2)], function(x) sum(df[[1]] * x)),
                        sapply(df[-c(1:2)], function(x) sum(df[[2]] * x))),
                  names(df)[1:2])
    #>      Maths Science English History
    #> Typ1     2       1       1       1
    #> Typ2     1       2       1       1
    

    The second version is a bit harder, but you could do something like

    df2 <- df[rep(seq(nrow(df)), times = (df$Typ1 == 1 & df$Typ2 == 1) + 1), -(1:2)]
    df2 <- as.matrix(df2)
    row.names(df2) <- names(df)[unlist(sapply(seq(nrow(df)), 
                                              function(x) which(df[x,1:2] == 1)))]
    
    df2
    #>      Maths Science English History
    #> Typ1     1       1       1       1
    #> Typ2     1       1       1       1
    #> Typ2     0       1       0       0
    #> Typ1     1       0       0       0
    

    Data in reproducible format

    df <- structure(list(Typ1 = c(1L, 0L, 1L), Typ2 = c(1L, 1L, 0L), Maths = c(1L, 
     0L, 1L), Science = c(1L, 1L, 0L), English = c(1L, 0L, 0L), History = c(1L, 
     0L, 0L)), class = "data.frame", row.names = c(NA, -3L))
    

    Created on 2022-10-20 with reprex v2.0.2