Search code examples
rggplot2density-plot

Y-axis changes with bandwidth in geom_density in R


As far as I understand it, the area under a density curve should always be equal to 1. This does not seem to be the case in R.

My code looks like this:

p <- ggplot() +
  geom_density(data = data_plot, aes_string(x = "value", color = group_by),
               position = "identity", size = 0.5, na.rm = TRUE) +
  labs(x = data_plot$unit[data_plot[, group_by] == group_member[1]], y = "density") +
  scale_colour_manual(values = color) +
        theme_own()
plot(p) 

When I change the geom_density input into

geom_density(data = data_plot, aes_string(x = "Wert", color = group_by),
               position = "identity", size = 0.5, na.rm = TRUE, bw = bandwidth)

I get different values on the y-axis.

No manual bw:

No manual bw

Bw = 0.01:

Bw = 0.01

Bw = 0.00001:

Bw = 0.00001

Am I interpreting something wrong? I did expect the range of the y-axis to get bigger with a rising bandwith (since many values are at 67 and 100), but shouldn't the curves be lower? For example in the last plot the area is around 30(x-axis)*100(y-axis)=3'000.


Solution

  • It is true that the total area under a probability density curve should be always 1. However, this restriction would still allow the density values on the y-axis to go beyond 1, since you will have to multiply the height of the density area you are interested in with the width of the corresponding area (which is usually done by solving integrals)

    Consider, for example, a uniform distribution ranging from 0 to 0.1. Here, the constant density value would be 10, since 0.1 * 10 = 1.

    # example: the shorter the interval between min and max, the larger the 
    # the density value becomes
      curve(dunif(x = x, min = 0, max = 0.1), from = 0, to = 0.1)
    

    PDF of uniform distribution from 0 to 1

    With the bandwidth argument in your code, you are essentially making the intervals of interest smaller and smaller, which results in higher and higher density values.