Let's use the airquality dataset as a base.
myaqm <- melt(airquality, id=c("Month", "Day"), na.rm = TRUE)
This gives me a simplification of my real dataset. So far, I've done this:
myaqm_dcast <- dcast(myaqm, Day+variable~Month, value.var = "value", sum, margins=c("Day", "variable", "Month"))
Which gives me this:
> head(myaqm_dcast, n=10)
Day variable 5 6 7 8 9 (all)
1 1 Ozone 41.0 0.0 135.0 39.0 96.0 311.0
2 1 Solar.R 190.0 286.0 269.0 83.0 167.0 995.0
3 1 Wind 7.4 8.6 4.1 6.9 6.9 33.9
4 1 Temp 67.0 78.0 84.0 81.0 91.0 401.0
5 1 (all) 305.4 372.6 492.1 209.9 360.9 1740.9
6 2 Ozone 36.0 0.0 49.0 9.0 78.0 172.0
7 2 Solar.R 118.0 287.0 248.0 24.0 197.0 874.0
8 2 Wind 8.0 9.7 9.2 13.8 5.1 45.8
9 2 Temp 72.0 74.0 85.0 81.0 92.0 404.0
10 2 (all) 234.0 370.7 391.2 127.8 372.1 1495.8
However, I'm trying to create an additional variables'-percentage-of-days'-subtotal column for each current numeric column. So my goal is something like:
Day variable 5 5(day %) 6 6(day %) 7 7(day %) 8 8(day %) 9 9(day %) (all) (all)(day %)
1 Ozone 41 13.4% 0 0.0% 135 27.4% 39 18.6% 96 26.6% 311 17.9%
1 Solar.R 190 62.2% 286 76.8% 269 54.7% 83 39.5% 167 46.3% 995 57.2%
1 Wind 7.4 2.4% 8.6 2.3% 4.1 0.8% 6.9 3.3% 6.9 1.9% 33.9 1.9%
1 Temp 67 21.9% 78 20.9% 84 17.1% 81 38.6% 91 25.2% 401 23.0%
1 (all) 305.4 100.0% 372.6 100.0% 492.1 100.0% 209.9 100.0% 360.9 100.0% 1740.9 100.0%
2 Ozone 36 15.4% 0 0.0% 49 12.5% 9 7.0% 78 21.0% 172 11.5%
2 Solar.R 118 50.4% 287 77.4% 248 63.4% 24 18.8% 197 52.9% 874 58.4%
2 Wind 8 3.4% 9.7 2.6% 9.2 2.4% 13.8 10.8% 5.1 1.4% 45.8 3.1%
2 Temp 72 30.8% 74 20.0% 85 21.7% 81 63.4% 92 24.7% 404 27.0%
2 (all) 234 100.0% 370.7 100.0% 391.2 100.0% 127.8 100.0% 372.1 100.0% 1495.8 100.0%
Sorry for the terrible formatting! But as you can hopefully see, the new additional columns give a percentage of each variable for that day and that month.
I've found another Stack Overflow helper suggest using tidyr and dplyr but I just couldn't adapt their example to my needs. Would someone please show me what to do?
I wrote a percentage function and used that with dplyr. Then I join the columns together.
pct <- function(x) {x/sum(x)}
df <- myaqm_dcast %>%
filter(variable != "(all)") %>%
group_by(Day) %>%
mutate_each(funs(pct), 3:8) %>%
inner_join(myaqm_dcast, by = c("Day", "variable"))
Edit: You can modify the percentage function to print however you need (*100, paste the % symbol).
Edit 2: If you can live without the (all) rows, I've filtered it out. You can always calculate the column sums using the summarise_each() function.