Search code examples
rplyrsurveyweighted

weighted table data frame with plyr


I'm working with survey data consisting of integer value responses for multiple questions (y1, y2, y3, ...) and a weighted count assigned to each respondent, like this:

foo <- data.frame(wcount = c(10, 1, 2, 3),        # weighted counts
                  y1 = sample(1:5, 4, replace=T), # numeric responses
                  y2 = sample(1:5, 4, replace=T), #
                  y3 = sample(1:5, 4, replace=T)) #
>foo
  wcount y1 y2 y3
1     10  5  5  5
2      1  1  4  4
3      2  1  2  5
4      3  2  5  3

and I'd like to transform this into a consolidated data frame version of a weighted table, with the first column representing the response values, and the next 3 columns representing the weighted counts. This can be done explicitly by column using:

library(Hmisc)
ty1 <- wtd.table(foo$y1, foo$wcount)
ty2 <- wtd.table(foo$y2, foo$wcount)
ty3 <- wtd.table(foo$y3, foo$wcount)

bar <- merge(ty1, ty2, all=T, by="x")
bar <- merge(bar, ty3, all=T, by="x")

names(bar) <- c("x", "ty1", "ty2", "ty3")
bar[is.na(bar)]<-0
>bar
  x ty1 ty2 ty3
1 1   3   0   0
2 2   3   2   0
3 3   0   0   3
4 4   0   1   1
5 5  10  13  12

I suspect there is a way of automating this with plyr and numcolwise or ddply. For instance, the following comes close, but I'm not sure what else is needed to finish the job:

library(plyr)
bar2 <- numcolwise(wtd.table)(foo[c("y1","y2","y3")], foo$wcount)
>bar2
        y1       y2       y3
1  1, 2, 5  2, 4, 5  3, 4, 5
2 3, 3, 10 2, 1, 13 3, 1, 12

Any thoughts?


Solution

  • Not a plyr answer, but this struck me as a reshaping/aggregating problem that could be tackled straightforwardly using functions from package reshape2.

    First, melt the dataset, making a column of the response value which can be named x (the unique values in y1-y3).

    library(reshape2)
    dat2 = melt(foo, id.var = "wcount", value.name = "x")
    

    Now this can be cast back wide with dcast, using sum as the aggregation function. This puts y1-y3 back as columns with the sum of wcount for each value of x.

    # Cast back wide using the values within y1-y3 as response values
        # and filling with the sum of "wcount"
    dcast(dat2, x ~ variable, value.var = "wcount", fun = sum)
    

    Giving

      x y1 y2 y3
    1 1  3  0  0
    2 2  3  2  0
    3 3  0  0  3
    4 4  0  1  1
    5 5 10 13 12