I recently came across the R-package beanplot and the offered possibility to plot the distribution of two subgroups in one single plot (special asymmetric beanplot). You find a description of the package in the Journal of Statistical Software and on the cran.r-project.org.
I produced an asymmetric beanplot using the following CODE:
library(psych)
library(beanplot)
var1 <-c(20,33,NA,39,NA,40,34,33,NA,38,NA,8,7,NA,NA,40,34,24,25,36,40,37,34,NA,35)
var2 <- c(1,0,1,1,1,0,1,0,1,NA,1,0,0,0,0,1,1,0,1,0,1,1,NA,0,1)
mydata<-data.frame(var1,var2)
table(mydata)
par(lend = 1, mai = c(0.8, 0.8, 0.5, 0.5))
beanplot(var1 ~ var2, data= mydata, side = "both",log="",
what=c(1,1,1,0), border = NA, col = list("black", c("grey", "white")))
legend("bottomleft", fill =c("black", "grey"), legend = c("no", "yes"))
The produced plot nicely shows the different shape of the two subgroups' distribution.
PROBLEM
The dependent variable is measured on a scale ranging from 7 to 40. However, the y-axis appears to go from -1 to +55.
It would be great if anyone could explain how the scale is modified, i.e. what is actually plotted here. Is there a way to plot the distribution by using the original scale?
Many many thanks!
beanplot
uses density
. The estimated density can give mass to areas past the range of the observed data. You could try this to get an idea of what density does - plot(density(1:2))
and you should see that it's just taking an average of gaussian densities centered at the data points (note that you can use a different kernel as beanplot
does allow you to specify a kernel parameter). How it chooses the variance for that gaussian is up to you but by default it looks like beanplot uses bw.SJ
with the "dpi" method to choose the bandwidth.
You could use the cutmin and cutmax to control the range that beanplot actually plots over but this doesn't actually change the density estimate.