Search code examples
rdataframesubtotal

Calculating subtotals (sum, stdev, average etc)


I have been searching for this for a while, but haven't been able to find a clear answer so far. Probably have been looking for the wrong terms, but maybe somebody here can quickly help me. The question is kind of basic.

Sample data set:

set <- structure(list(VarName = structure(c(1L, 5L, 4L, 2L, 3L),
 .Label = c("Apple/Blue/Nice", 
"Apple/Blue/Ugly", "Apple/Pink/Ugly", "Kiwi/Blue/Ugly", "Pear/Blue/Ugly"
), class = "factor"), Color = structure(c(1L, 1L, 1L, 1L, 2L), .Label = c("Blue", 
"Pink"), class = "factor"), Qty = c(45L, 34L, 46L, 21L, 38L)), .Names = c("VarName", 
"Color", "Qty"), class = "data.frame", row.names = c(NA, -5L))

This gives a data set like:

set


      VarName      Color Qty
1 Apple/Blue/Nice  Blue  45
2  Pear/Blue/Ugly  Blue  34
3  Kiwi/Blue/Ugly  Blue  46
4 Apple/Blue/Ugly  Blue  21
5 Apple/Pink/Ugly  Pink  38

What I would like to do is fairly straight forward. I would like to sum (or averages or stdev) the Qty column. But, also I would like to do the same operation under the following conditions:

  1. VarName includes "Apple"
  2. VarName includes "Ugly"
  3. Color equals "Blue"

Anybody that can give me a quick introduction on how to perform this kind of calculations?

I am aware that some of it can be done by the aggregate() function, e.g.:

aggregate(set[3], FUN=sum, by=set[2])[1,2]

However, I believe that there is a more straight forward way of doing this then this. Are there some filters that can be added to functions like sum()?


Solution

  • Is this what you're looking for?

     # sum for those including 'Apple'
     apple <- set[grep('Apple', set[, 'VarName']), ]
     aggregate(apple[3], FUN=sum, by=apple[2])
      Color Qty
    1  Blue  66
    2  Pink  38
    
     # sum for those including 'Ugly'
     ugly <- set[grep('Ugly', set[, 'VarName']), ]
     aggregate(ugly[3], FUN=sum, by=ugly[2])
      Color Qty
    1  Blue 101
    2  Pink  38
    
     # sum for Color==Blue
     sum(set[set[, 'Color']=='Blue', 3])
    [1] 146
    

    The last sum could be done by using subset

    sum(subset(set, Color=='Blue')[,3])