I have been searching for this for a while, but haven't been able to find a clear answer so far. Probably have been looking for the wrong terms, but maybe somebody here can quickly help me. The question is kind of basic.
Sample data set:
set <- structure(list(VarName = structure(c(1L, 5L, 4L, 2L, 3L),
.Label = c("Apple/Blue/Nice",
"Apple/Blue/Ugly", "Apple/Pink/Ugly", "Kiwi/Blue/Ugly", "Pear/Blue/Ugly"
), class = "factor"), Color = structure(c(1L, 1L, 1L, 1L, 2L), .Label = c("Blue",
"Pink"), class = "factor"), Qty = c(45L, 34L, 46L, 21L, 38L)), .Names = c("VarName",
"Color", "Qty"), class = "data.frame", row.names = c(NA, -5L))
This gives a data set like:
set
VarName Color Qty
1 Apple/Blue/Nice Blue 45
2 Pear/Blue/Ugly Blue 34
3 Kiwi/Blue/Ugly Blue 46
4 Apple/Blue/Ugly Blue 21
5 Apple/Pink/Ugly Pink 38
What I would like to do is fairly straight forward. I would like to sum (or averages or stdev) the Qty column. But, also I would like to do the same operation under the following conditions:
Anybody that can give me a quick introduction on how to perform this kind of calculations?
I am aware that some of it can be done by the aggregate() function, e.g.:
aggregate(set[3], FUN=sum, by=set[2])[1,2]
However, I believe that there is a more straight forward way of doing this then this. Are there some filters that can be added to functions like sum()
?
Is this what you're looking for?
# sum for those including 'Apple'
apple <- set[grep('Apple', set[, 'VarName']), ]
aggregate(apple[3], FUN=sum, by=apple[2])
Color Qty
1 Blue 66
2 Pink 38
# sum for those including 'Ugly'
ugly <- set[grep('Ugly', set[, 'VarName']), ]
aggregate(ugly[3], FUN=sum, by=ugly[2])
Color Qty
1 Blue 101
2 Pink 38
# sum for Color==Blue
sum(set[set[, 'Color']=='Blue', 3])
[1] 146
The last sum could be done by using subset
sum(subset(set, Color=='Blue')[,3])