Search code examples
rggplot2statisticstidyversefrequency

ggplot2: add line for average count values resulting from geom_freqpoly


I am trying to add an additional line to my geom_freqpoly plot that represents the average count per binwidth. I tried two different things but none of them were successful.

  1. I tried adding the line as a geom_line but got an error asking if I map my stat in the wrong layer.
library(tidyverse)

iris %>% 
  ggplot(aes(x = Petal.Length, y = ..count..)) +
  geom_freqpoly(aes(color = Species), 
                binwidth = 0.2) +
  geom_line(aes(yintercept = "mean"))
#> Warning: Ignoring unknown aesthetics: yintercept
#> Error: Aesthetics must be valid computed stats. Problematic aesthetic(s): y = ..count... 
#> Did you map your stat in the wrong layer?
  1. I tried adding another geom_freqpoly, like:
library(tidyverse)

iris %>% 
  ggplot() +
  geom_freqpoly(aes(x = Petal.Length, y = ..count.., color = Species), 
                binwidth = 0.2) +
  geom_freqpoly(aes(x = Petal.Length, y = mean(..count..), color = "red"), binwidth = 0.2)

But the resulting line is not what I expect.

Image wrong result

Using the Iris dataset I would expect that the new line would represent the average count of Species by the defined binwidth (see image below), not what I am getting. My understanding is that geom_freqpoly divides a continues variable (like Petal.Length) in length bins (of 0.2 length in this case). So for each bin I want to have the average count of each specie and draw a line connecting those points.

Drawing Expected result

Created on 2020-05-23 by the reprex package (v0.3.0)


Solution

  • Based on the edit of your question maybe this is what you expected.

    The problem with your approach is that mean(..count..) simply computes the mean of the vector ..count.. which gives you one number and therefore a horizontal line. Therefore simply divide by the number of groups or Species

    I'm not completely satisfied with my solution because I would like to avoid the code n_distinct(iris$Species). I tried some approaches with tapply but failed. So as a first step ...

    library(tidyverse)
    
    ggplot(iris, aes(x = Petal.Length)) +
      geom_freqpoly(aes(color = Species), binwidth = .2) +
      geom_freqpoly(aes(y = ..count.. / n_distinct(iris$Species)), color = "red", binwidth = .2)
    

    Created on 2020-05-28 by the reprex package (v0.3.0)