R Data.Table With Weights

library(data.table)
data = data.table("STUDENT" = c(1:100),
                  "SAMPLEWEIGHT" = sample(12:99, r = T, 100),
"LABEL1" = sample(1:2, r = T, 100),
"LABEL3" = sample(1:3, r = T, 100),
"CAT"=sample(0:1,r = T, 100),
"FOX"=sample(0:1,r = T, 100),
"DOG"=sample(0:1,r = T, 100),
"MOUSE"=sample(0:1,r = T, 100),
"BIRD"=sample(0:1,r = T, 100))

dataWANT = data.frame("LABEL1" = c(1,1,1,2,2,2),
                                            "LABEL3" = c(1,2,3,1,2,3),
                                            "CAT_N" = NA,
                                            "CAT_PER" = NA,
                                            "FOX_N" = NA,
                                            "FOX_PER" = NA,
                                            "DOG_N" = NA,
                                            "DOG_PER" = NA,
                                            "MOUSE_N" = NA,
                                            "MOUSE_PER" = NA,
                                            "BIRD_N" = NA,
                                            "BIRD_PER" = NA)

I have a data.table call it data, and am attempting to try and summarize the student data like what is shown in dataWANT.

in dataWANT the columns that have _N at the end is just the count of values in the column that equals to 1 for each LABEL1 and LABEL3 combination so a total of 6 groups.

in dataWANT the columns that have _PER at the end is the weighted proportion of the groups that have ones in their column.

Solution

An option using data.table would be to group by 'LABEL1', 'LABEL3', specify the columns of interest in .SDcols, get the sum (as it binary columns) by looping over the .SD and concatenate with the weighted.mean based on the 'SAMPLEWEIGHT' column

library(data.table)
data[, c(setNames(lapply(.SD, sum), paste0(names(.SD), "_N")), 
  setNames(lapply(.SD, function(x) weighted.mean(x == 1, SAMPLEWEIGHT)), 
     paste0(names(.SD), "_PER"))),.(LABEL1, LABEL3), .SDcols = CAT:BIRD]