Search code examples
rdata-analysis

How to apply a summarization measure to matching data.frame columns in R


I have a hypothetical data-frame as follows:

# inventory of goods            
year    category    count-of-good
2010    bikes       1   
2011    bikes       3   
2013    bikes       5   
2010    skates      1   
2011    skates      1   
2013    skates      0   
2010    skis        0   
2011    skis        2
2013    skis        2

my end goal is to show a stacked bar chart of how the %-<good>-of-decade-total has changed year-to-year.

therefore, i want to compute the following:

enter image description here

now, i should be able to ggplot(df, aes(factor(year), fill=percent.total.decade.goods) + geom_bar, or similar (hopefully!), creating a bar chart where each bar sums to 100%.

however, i'm struggling to determine how to get percent.good.of.decade.total (the far right column) in non-hacky way. Thanks for your time!


Solution

  • You can use dplyr to compute the sum:

    library("dplyr") newDf=df%>%group_by(year)%>%mutate(decades.total.goods=sum(count.of.goods))%>%ungroup()

    Either use mutate or normal R syntax to compute the "% good of decade total"

    Note: you have not shared your exact data-frame, so the names are obviously made up.