The data describe the distribution of commodities (apples and bananas) on the trees along the road between two villages, Villariba and Villabajo, which is 4000+ m long. The data are either already binned (i.e. are given summarized over every 500 m), or are supplied with big errors of locations, so binning by 500 m is natural. We want to process and plot them as a smoothed post factum distributions via kernel smoothing. There are two obvious ways to do this in ggplot2
package. First read data (long format).
library(ggplot2)
databas<-read.csv(text="dist,stuff,val
500,apples,10
1250,apples,25
1750,apples,55
2250,apples,45
2750,apples,25
3250,apples,10
3750,apples,5
500,bananas,7
1250,bananas,14
1750,bananas,20
2250,bananas,17
2750,bananas,10
3250,bananas,30
3750,bananas,20")
The first try is a boring barplot with geom_col()
. Next, we can use two ggplot2 facilities contained in density plots (geom_density()
) and in smoothing curves (stat_smooth()
or equivalently geom_smooth()
) respectively. The three ways are realized as follows:
p1<-ggplot(databas,aes(dist,val,fill=stuff,alpha=0.5))+geom_col(alpha=0.5,position="dodge")
p2<-ggplot(databas,aes(dist,val,fill=stuff))+stat_smooth(aes(y=val,x=dist),method="gam",se=FALSE,formula=y~s(x,k=7))
p3<-ggplot(databas,aes(dist,val,fill=stuff,alpha=0.5))+geom_density(stat="identity")
library(gridExtra)
grid.arrange(p1,p2,p3,nrow=3)
There are shortcomings of every method. The superimposed density plot (bottom graph) is the most desired design, but the option stat="identity"
(since data are binned) prevents from creating fine-looking smooth distribution, like it were normally. The stat_smooth()
option gives almost excellent curves, but these are just curves. So: how to combine the coloring from density plot and the smoothing from smoothing function? That is either to smoothen data in geom_density(), or to fill the space with semi-transparent colors under stat_smooth()
curves?
If you like your gam
fits, you can use stat = "smooth"
within geom_ribbon
to draw the curves. The trick is to set ymin
to 0 and ymax
to ..y..
, which is the special variable created by stat_smooth
that is the predicted line.
ggplot(databas, aes(x = dist, y = val, fill = stuff)) +
geom_ribbon(stat = "smooth", aes(ymin = 0, ymax = ..y..), alpha = .5,
method = "gam", se=FALSE, formula = y ~ s(x, k = 7))