Search code examples
rreshape2dcast

R - Pivot table with subtotals


How do I get Pivots with subtotals (like in MS Excel Pivot Tables) in R?

I am using dcast from reshape2 package to create pivots in R. I also got grand totals working using rowSums and colSums. I admit I do not understand the intricacies in the dcast parameter set. I just know how to create the pivot and the help file is going over my head. It will be extremely helpful if someone can solve this using dcast (I suspect it can do it all), and explain the parameters necessary for the solution.

I am using this code (C2 has two factors, X1 & X2):

PIV <- dcast(DF, C1~C2, value.var="C3", sum)

I am not getting what all these margins, subset, fills or drop does. The help file seems overly obtuse, and I did not get much help on the internet as well.

Sample Pivot (without subtotal, T is total):

C1 X1 X2 T
a  12 1  13
a  14 2  16
b  16 3  19
b  11 4  15
b  8  5  13
T  61 15 76

Sample Expected Pivot (with subtotal t, T is total)

C1 X1 X2 T
a  12 1  13
a  14 2  16
ta 26 3  29
b  16 3  19
b  11 4  15
b  8  5  13
tb 35 12 47
T  61 15 76

Solution

  • Assuming you're starting without any totals, with something like this:

    mydf <- structure(list(C1 = c("a", "a", "b", "b", "b"), X1 = c(12L, 14L, 
        16L, 11L, 8L), X2 = 1:5), .Names = c("C1", "X1", "X2"), row.names = c(NA, 
        5L), class = "data.frame")
    
    mydf
    ##   C1 X1 X2
    ## 1  a 12  1
    ## 2  a 14  2
    ## 3  b 16  3
    ## 4  b 11  4
    ## 5  b  8  5
    

    Then you would have to use the margins argument for dcast to get the output you're after.

    library(reshape2)
    mydfl <- melt(mydf)
    mydfl$ind <- with(mydfl, ave(C1, C1, variable, FUN = seq_along))
    dcast(mydfl, C1 + ind ~ variable, sum, 
          margins = c("C1", "ind", "variable"))
    #      C1   ind X1 X2 (all)
    # 1     a     1 12  1    13
    # 2     a     2 14  2    16
    # 3     a (all) 26  3    29
    # 4     b     1 16  3    19
    # 5     b     2 11  4    15
    # 6     b     3  8  5    13
    # 7     b (all) 35 12    47
    # 8 (all) (all) 61 15    76
    

    The "margins" argument is what is used to get the subtotals. Here, we're taking subtotals according to the "C1" (Grand Total) and "ind" variable (Subtotal), and we're also adding row totals (using "variable").

    That said, I'm not sure if I'd recommend this as you're sort of mixing data and analysis in the same table.