I have an x variable - for instance size that is meaningful so that something can be two times the size of something else. I want to relate x to y (some other variable). At the same time, due to sampling x does not vary continuously but is discrete because there are just a few different object types and all objects of the same type of the same size (e.g. size of 1, 3 or 10). I want to use geom's like geom_boxplot
or geom_violin
to dsiplay the relationship between x & y.
However, the problem is that: If I keep x numeric, then I am only getting one boxplot/violin. If I convert it to a factor (shown below), then the distance between the geom does not reflect the distance in x. For instance, the distance between 1 & 3 is the same as the distance between 3 & 10.
Is there a way to discretise the data but change the spacing so it reflects the actual difference on the x-axis and use those geoms?
# Seed for reproducibility
set.seed(20230518)
# Create random data
n <- 10
df <- data.frame(x = factor(rep(c(1, 3, 10), each = n)),
y = c(rnorm(n), rnorm(n), rnorm(n)))
# Box plot version
ggplot(df, aes(x = x, y = y)) + geom_boxplot() + geom_point()
# Violine plot verion
ggplot(df, aes(x = x, y = y)) + geom_violin() + geom_point()
The distance between the geoms reflects the difference in x. This be should similar to this just with geom_boxplot
& geom_violin
in addition to the points:
# Nnumeric
ggplot(df, aes(x = as.numeric(as.character(x)), y = y)) + geom_point()
Add missing factor levels, then set drop to FALSE, works for geom_violin
, too.
df$x <- factor(df$x, levels = 1:10)
ggplot(df, aes(x = x, y = y)) +
geom_boxplot() +
geom_point() +
scale_x_discrete(drop = FALSE)
Add breaks to hide other x values:
scale_x_discrete(drop = FALSE, breaks = unique(sort(df$x)))