I am using the data.table package for a table like this:
DT <- data.table(id=rep(1:100, each=50),
grp=rep(letters[1:4], each=1250),
time=rep(1:50,100),
outcome=rnorm(5000),
seconds=rep(1:500,10),
weights=rnorm(5000),
response=rep(1:200, each=25),
key=c("grp", "time"))
I would like to create a new (possibly rbindlisted) data table of some summary statistics from this table. I first created two intermediary tables a and b,
a <- DT[, list(mean = weighted.mean(outcome, weights),
median=median(outcome),seconds), by=c("grp","time")]
b <- DT[, list(mean=weighted.mean(response, seconds),
median=median(response)), by=c("grp","time")]
and then am trying to rowbind these together across all groups but still preserve the grouping along the rows. This does not work:
DTfinal <- data.table(DT$grp, DT$time,
outcomemean=a$mean, responsemean=b$mean,
outcomemedian=a$median, responsemedian=b$median)
I don't think a merge works since a and b have different lengths. Rowbinding a and b also mixes up the different means and medians of a and b, ideally I would like a rbindlist that has some kind of suffix for each column like c(".a",".b").
Update: I get an error (since a and b have different dimension) doing
DTfinal <- rbindlist(setNames(list(a[, c("grp", "time", "mean", "median"),
with = FALSE],
b[, c("grp", "time", "mean", "median"),
with = FALSE]),
c("a", "b")),
idcol= "id")
dcast(DTfinal, grp + time ~id, value.var = c('mean', 'median'))
where it returns
Aggregate function missing, defaulting to 'length'
We can use rbindlist
after placing the datasets in a list
DTfinal <- rbindlist(list(a,b))
dim(DTfinal)
#[1] 400 4
dim(a)
#[1] 200 4
dim(b)
#[1] 200 4
Suppose if both datasets have different number of columns, and we have a vector of column names that we need to keep
nm1 <- intersect(names(a), names(b))
rbindlist(list(a[, nm1, with = FALSE], b[, nm1, with = FALSE]), idcol= "id")
If we need to convert to 'wide' format
DTfinal <- rbindlist(setNames(list(a,b), c("a", "b")), idcol= "id")
dcast(DTfinal, grp + time ~id, value.var = c('mean', 'median'))
# grp time mean_a mean_b median_a median_b
# 1: a 1 0.52171471 25.99502 -0.06558068 25
# 2: a 2 0.36445108 25.99010 0.13518412 25
# 3: a 3 0.08993721 25.98522 0.20128790 25
# 4: a 4 -64.04617391 25.98039 0.40999376 25
# 5: a 5 0.81730847 25.97561 -0.03481697 25
# ---
#196: d 46 1.62818374 176.67568 -0.26695999 176
#197: d 47 -1.45259871 176.67340 0.14893356 176
#198: d 48 9.59796683 176.67114 -0.05834959 176
#199: d 49 -2.74285453 176.66890 -0.22094347 176
#200: d 50 1.22109043 176.66667 -0.08172928 176