Search code examples
rggplot2bar-chartsmoothingdensity-plot

R: smoothing binned data in barplots with ggplot2


The data describe the distribution of commodities (apples and bananas) on the trees along the road between two villages, Villariba and Villabajo, which is 4000+ m long. The data are either already binned (i.e. are given summarized over every 500 m), or are supplied with big errors of locations, so binning by 500 m is natural. We want to process and plot them as a smoothed post factum distributions via kernel smoothing. There are two obvious ways to do this in ggplot2 package. First read data (long format).

library(ggplot2)
databas<-read.csv(text="dist,stuff,val
500,apples,10
1250,apples,25
1750,apples,55
2250,apples,45
2750,apples,25
3250,apples,10
3750,apples,5
500,bananas,7
1250,bananas,14
1750,bananas,20
2250,bananas,17
2750,bananas,10
3250,bananas,30
3750,bananas,20")

The first try is a boring barplot with geom_col(). Next, we can use two ggplot2 facilities contained in density plots (geom_density()) and in smoothing curves (stat_smooth() or equivalently geom_smooth()) respectively. The three ways are realized as follows:

    p1<-ggplot(databas,aes(dist,val,fill=stuff,alpha=0.5))+geom_col(alpha=0.5,position="dodge")
    p2<-ggplot(databas,aes(dist,val,fill=stuff))+stat_smooth(aes(y=val,x=dist),method="gam",se=FALSE,formula=y~s(x,k=7))
    p3<-ggplot(databas,aes(dist,val,fill=stuff,alpha=0.5))+geom_density(stat="identity")

library(gridExtra)
grid.arrange(p1,p2,p3,nrow=3)

three plots with density smoothing in ggplot2

There are shortcomings of every method. The superimposed density plot (bottom graph) is the most desired design, but the option stat="identity" (since data are binned) prevents from creating fine-looking smooth distribution, like it were normally. The stat_smooth() option gives almost excellent curves, but these are just curves. So: how to combine the coloring from density plot and the smoothing from smoothing function? That is either to smoothen data in geom_density(), or to fill the space with semi-transparent colors under stat_smooth() curves?


Solution

  • If you like your gam fits, you can use stat = "smooth" within geom_ribbon to draw the curves. The trick is to set ymin to 0 and ymax to ..y.., which is the special variable created by stat_smooth that is the predicted line.

    ggplot(databas, aes(x = dist, y = val, fill = stuff)) +
        geom_ribbon(stat = "smooth", aes(ymin = 0, ymax = ..y..), alpha = .5,
                    method = "gam", se=FALSE, formula = y ~ s(x, k = 7))
    

    enter image description here