Search code examples
rdata.tablegroup

Group values based on criterias -R


i have this kind of dataset:

dt <- data.table(ID = c(1, 2, 3, 4),
                 q1= c(1, 2, 3, 5), 
                 q2= c(3, 5, 2, 4), 
                 q3= c(2, 3, 4, 3),
                 education = c("A", "B", "C", "D"))

I want toget a table that counts the values into different groups: All ID with value 1,3,4 should be counted in a group called "YES" All ID with value 1,3 should be counted in a group called "maybe" (some ID will be counted twice here) All ID with value 5,2 should be under "NO"

final output should be a table for each education level and based on the q's:

   YES    maybe    NO
q1
q2
q3

Hope you can help me


Solution

  • Here's a data.table-native approach:

    library(data.table)
    categs <- list(YES=c(1,3,4), maybe=c(1,3), NO=c(2,5))
    out <- melt(dt, "education", measure.vars = c("q1", "q2", "q3")
      )[, names(categs) := lapply(categs, `%in%`, x = value)
      ][, lapply(.SD, function(z) as.numeric(sum(z))), by = .(education, variable), .SDcols = names(categs)
      ][, names(categs) := lapply(.SD, function(z) z/sum(z)),
        by = .(education), .SDcols = names(categs)]
    out
    #     education variable   YES maybe    NO
    #        <char>   <fctr> <num> <num> <num>
    #  1:         A       q1   0.5   0.5   0.0
    #  2:         B       q1   0.0   0.0   0.5
    #  3:         C       q1   0.5   1.0   0.0
    #  4:         D       q1   0.0   0.0   1.0
    #  5:         A       q2   0.5   0.5   0.0
    #  6:         B       q2   0.0   0.0   0.5
    #  7:         C       q2   0.0   0.0   1.0
    #  8:         D       q2   0.5   0.0   0.0
    #  9:         A       q3   0.0   0.0   1.0
    # 10:         B       q3   1.0   1.0   0.0
    # 11:         C       q3   0.5   0.0   0.0
    # 12:         D       q3   0.5   1.0   0.0