So recently I've been learning the functions of aggregate and I find it really useful for large data sets I work with. Normally I manually manipulate data in excel which, while effective, is extremely time consuming.
So I was wondering if the following could be done. I have a data set that is similarly structured like so (but much bigger):
Fruit Crate Mass
Apple A 4
Banana A 3.4
Orange B 2
Apple C 2.1
Apple C 4.5
Banana C 5
Orange D 1
Apple D 1.3
Orange D 2.4
Orange D 3.2
Orange E 2
Banana E 1.1
Banana E 0.7
Apple E 2
Now I know with this I get Mass per crate:
TotalCrate<-aggregate(data$Mass,list(crate=data$Crate), sum)
And with this I get Mass per fruit per crate:
FruitperCrate<-aggregate(data$Mass, list(fruit=data$Fruit, crate=data$Crate), sum)
Now is there a way that I can get percent mass of fruit per crate, so basically, is there a way that I can divide the fruit mass by the total mass of for each respective crate? And if so, how can I do it for future reference.
Any help is appreciated.
Thank you
1) First aggregate by Fruit
and Crate
and then use ave
with prop.table
to get the proportions of each fruit in its crate:
ag <- aggregate(Mass ~ Fruit + Crate, data, sum)
tr <- transform(ag, percent = 100 * ave(Mass, Crate, FUN = prop.table))
giving:
> tr
Fruit Crate Mass percent
1 Apple A 4.0 54.05405
2 Banana A 3.4 45.94595
3 Orange B 2.0 100.00000
4 Apple C 6.6 56.89655
5 Banana C 5.0 43.10345
6 Apple D 1.3 16.45570
7 Orange D 6.6 83.54430
8 Apple E 2.0 34.48276
9 Banana E 1.8 31.03448
10 Orange E 2.0 34.48276
or graphically:
library(ggplot2)
ggplot(tr, aes(Crate, percent, fill = Fruit)) +
geom_bar(stat = "identity") +
scale_fill_manual(values = c("red", "yellow", "orange"))
1a) This could also be expressed in a magrittr pipeline like this:
library(magrittr)
data %>%
do(aggregate(Mass ~ Fruit + Crate, ., sum)) %>%
transform(percent = 100 * ave(Mass, Crate, FUN = prop.table))
2) and here is an alternative using dplyr that follows similar logic:
library(dplyr)
data %>%
group_by(Crate, Fruit) %>%
summarize(Mass = sum(Mass)) %>%
ungroup() %>%
group_by(Crate) %>%
mutate(percent = 100 * prop.table(Mass)) %>%
ungroup()
giving:
# A tibble: 10 x 4
Crate Fruit Mass percent
<fctr> <fctr> <dbl> <dbl>
1 A Apple 4.0 54.05405
2 A Banana 3.4 45.94595
3 B Orange 2.0 100.00000
4 C Apple 6.6 56.89655
5 C Banana 5.0 43.10345
6 D Apple 1.3 16.45570
7 D Orange 6.6 83.54430
8 E Apple 2.0 34.48276
9 E Banana 1.8 31.03448
10 E Orange 2.0 34.48276
3) A 2d layout could be had using xtabs
:
xt <- 100 * prop.table(xtabs(Mass ~ Crate + Fruit, data), 1)
giving:
> xt
Fruit
Crate Apple Banana Orange
A 54.05405 45.94595 0.00000
B 0.00000 0.00000 100.00000
C 56.89655 43.10345 0.00000
D 16.45570 0.00000 83.54430
E 34.48276 31.03448 34.48276
which can be readily be shown in a graph like this:
plot(xt, col = c("red", "yellow", "orange"),
main = "Proportion of Mass of Fruit per Crates")
giving:
The 2d layout can be reformed into long form with ftable
:
ftable(xt, row.vars = 1:2)
giving:
Crate Fruit
A Apple 54.05405
Banana 45.94595
Orange 0.00000
B Apple 0.00000
Banana 0.00000
Orange 100.00000
C Apple 56.89655
Banana 43.10345
Orange 0.00000
D Apple 16.45570
Banana 0.00000
Orange 83.54430
E Apple 34.48276
Banana 31.03448
Orange 34.48276
Note 1: The two lines of code in the question could be written using formula notation like this:
aggregate(Mass ~ Crate, data, sum)
aggregate(Mass ~ Fruit + Crate, data, sum)
Note 2: The input used in reproducible form is:
Lines <- "Fruit Crate Mass
Apple A 4
Banana A 3.4
Orange B 2
Apple C 2.1
Apple C 4.5
Banana C 5
Orange D 1
Apple D 1.3
Orange D 2.4
Orange D 3.2
Orange E 2
Banana E 1.1
Banana E 0.7
Apple E 2"
data <- read.table(text = Lines, header = TRUE)