Search code examples
rggplot2density-plot

How does ggplot2 density differ from the density function?


Why do the following plots look different? Both methods appear to use Gaussian kernels.

How does ggplot2 compute a density?

library(fueleconomy)

d <- density(vehicles$cty, n=2000)
ggplot(NULL, aes(x=d$x, y=d$y)) + geom_line() + scale_x_log10()

enter image description here

ggplot(vehicles, aes(x=cty)) + geom_density() + scale_x_log10()

enter image description here


UPDATE:

A solution to this question already appears on SO here, however the specific parameters ggplot2 is passing to the R stats density function remain unclear.

An alternate solution is to extract the density data straight from the ggplot2 plot, as shown here


Solution

  • In this case, it is not the density calculation that is different but how the log10 transform is applied.

    First check the densities are similar without transform

    library(ggplot2)
    library(fueleconomy)
    
    d <- density(vehicles$cty, from=min(vehicles$cty), to=max(vehicles$cty))
    ggplot(data.frame(x=d$x, y=d$y), aes(x=x, y=y)) + geom_line() 
    ggplot(vehicles, aes(x=cty)) + stat_density(geom="line")
    

    So the issue seems to be the transform. In the stat_density below, it seems as if the log10 transform is applied to the x variable before the density calculation. So to reproduce the results manually you have to transform the variable prior to the calculating the density. Eg

    d2 <- density(log10(vehicles$cty), from=min(log10(vehicles$cty)), 
                                                   to=max(log10(vehicles$cty)))
    ggplot(data.frame(x=d2$x, y=d2$y), aes(x=x, y=y)) + geom_line() 
    ggplot(vehicles, aes(x=cty)) + stat_density(geom="line") + scale_x_log10()
    

    PS: To see how ggplot prepares the data for the density, you can look at the code as.list(StatDensity) leads to StatDensity$compute_group to ggplot2:::compute_density