library(data.table)
data = data.table("STUDENT" = c(1:100),
"SAMPLEWEIGHT" = sample(12:99, r = T, 100),
"LABEL1" = sample(1:2, r = T, 100),
"LABEL3" = sample(1:3, r = T, 100),
"CAT"=sample(0:1,r = T, 100),
"FOX"=sample(0:1,r = T, 100),
"DOG"=sample(0:1,r = T, 100),
"MOUSE"=sample(0:1,r = T, 100),
"BIRD"=sample(0:1,r = T, 100))
dataWANT = data.frame("LABEL1" = c(1,1,1,2,2,2),
"LABEL3" = c(1,2,3,1,2,3),
"CAT_N" = NA,
"CAT_PER" = NA,
"FOX_N" = NA,
"FOX_PER" = NA,
"DOG_N" = NA,
"DOG_PER" = NA,
"MOUSE_N" = NA,
"MOUSE_PER" = NA,
"BIRD_N" = NA,
"BIRD_PER" = NA)
I have a data.table call it data, and am attempting to try and summarize the student data like what is shown in dataWANT.
in dataWANT the columns that have _N at the end is just the count of values in the column that equals to 1 for each LABEL1 and LABEL3 combination so a total of 6 groups.
in dataWANT the columns that have _PER at the end is the weighted proportion of the groups that have ones in their column.
An option using data.table
would be to group by 'LABEL1', 'LABEL3', specify the columns of interest in .SDcols
, get the sum
(as it binary columns) by looping over the .SD
and concatenate with the weighted.mean
based on the 'SAMPLEWEIGHT' column
library(data.table)
data[, c(setNames(lapply(.SD, sum), paste0(names(.SD), "_N")),
setNames(lapply(.SD, function(x) weighted.mean(x == 1, SAMPLEWEIGHT)),
paste0(names(.SD), "_PER"))),.(LABEL1, LABEL3), .SDcols = CAT:BIRD]