I used code below to explore box-plots with ggplot2
.
MyData<-data.frame(CASES=c(rep("Good",10),rep("NotGood",10)), NUMBERS=c(2,3,1,5,6,3,2,6,8,3,1,3,6,8,17,3,2,5,7,20))
library(ggplot2)
MyBoxplot <- ggplot(MyData, aes(x=CASES, y=NUMBERS)) +
geom_boxplot()
MyBoxplot+ geom_jitter(shape=16, position=position_jitter(0.2))
I notice that if my data hasn't any outliers (Good
), then the box-plot has 10 points, as it should be. But if there are some outliers (NotGood
), then the outliers get doubled.
What is the problem?
As @stefan said, geom_boxplot()
automatically will plot the outliers as dots aligned with your x value, so the outlier points are represented twice. There is no function/argument to "remove" the outlier points from geom_boxplot()
; however, you can get the same effect by making the outlier points from the boxplot transparent via outlier.color=
.
As an example, take the following dataset, which has a few outliers. I've adjusted the shape in geom_jitter()
to make it easier to see which dots are coming from the outliers (and not the geom_jitter
).
library(ggplot2)
set.seed(1234)
df <- data.frame(x=rep(c('A','B'), each=100), y=c(rnorm(100, 10, 20), rnorm(100, 30, 50)))
ggplot(df, aes(x, y)) +
geom_boxplot() +
geom_jitter(position=position_jitter(0.2), shape=3)
If we set outlier.color=NA
, that makes those outliers from the boxplot "transparent", so they are no longer observed in the plot:
ggplot(df, aes(x, y)) +
geom_boxplot(outlier.color=NA) +
geom_jitter(position=position_jitter(0.2), shape=3)