Search code examples
rggplot2summarization

Plotting summary statistics


For the following data set,

Genre   Amount
Comedy  10
Drama   30
Comedy  20
Action  20
Comedy  20
Drama   20

I want to construct a ggplot2 line graph, where the x-axis is Genre and the y-axis is the sum of all amounts (conditional on the Genre).

I have tried the following:

p = ggplot(test, aes(factor(Genre), Gross)) + geom_point()
p = ggplot(test, aes(factor(Genre), Gross)) + geom_line()
p = ggplot(test, aes(factor(Genre), sum(Gross))) + geom_line()

but to no avail.


Solution

  • If you don't want to compute a new data frame before plotting, you cvan use stat_summary in ggplot2. For example, if your data set looks like this :

    R> df <- data.frame(Genre=c("Comedy","Drama","Action","Comedy","Drama"),
    R+                  Amount=c(10,30,40,10,20))
    R> df
       Genre Amount
    1 Comedy     10
    2  Drama     30
    3 Action     40
    4 Comedy     10
    5  Drama     20
    

    You can use either qplot with a stat="summary" argument :

    R> qplot(Genre, Amount, data=df, stat="summary", fun.y="sum")
    

    Or add a stat_summary to a base ggplot graphic :

    R> ggplot(df, aes(x=Genre, y=Amount)) + stat_summary(fun.y="sum", geom="point")