I am playing box plots and violin plots with ggplot2
, but I find some odd phenomena which happen only when the number of unique data are less than four. I am not very sure whether SO is the proper place for this thread, if not, please guild me to the right place.
df <- data.frame(state = "bedtime", value = 100)
ggplot(aes(x = state, y = value), data = df) + geom_boxplot() + geom_point()
ggplot(aes(x = state, y = value), data = df) + geom_violin()
Nothing. Received a warning message.
If it's not, it's like the case of single data point. If it's rendered, the quantile lines are inconsistent.
df <- data.frame(state = rep("after_meal", 4), value = rep(c(178, 162), each = 2))
ggplot(aes(x = state, y = value), data = df) + geom_boxplot() + geom_point()
ggplot(aes(x = state, y = value), data = df) + geom_violin(draw_quantiles = c(0.25, 0.5, 0.75))
As you can see, the quantile lines are inconsistent with each other.
geom_violin
? Or is it the rule of violin plots?A violin plot is a density estimate plot reflected along the vertical axis, and is different from a box plot in that a box plot shows the data itself.
So as to your first question, with one point the density is infinite, because you request it at one specific point in space with a zero width, i.e. infinite height (to see this, replace geom_violin
with geom_density
.
The second issue stems from the same thing: a box plot is more accurate for a small number of points, because a density estimation is continuous, and is not well-defined for a very short range.