Search code examples
rggplot2violin-plot

control x axis of a violin plot in ggplot2


I'm generating violin plots in ggplot2 for a time series, year_1 to year_32. The years in my df are stored as numerical values. From the examples I've seen, it seems that I must convert these numerical year values to factors to plot one violin per year; and in fact, if I run the code without as.factors, I get one big fat violin. I would like to understand why geom_violin can't have numeric values on the x axis; or if I'm wrong about that, how to use them?

So:

my_data$year <- as.factor(my_data$year)

p <- ggplot(data = my_data, aes(x = year, y = continuous_var)+
 geom_violin(fill = "#FF0000", color = "#000000")+
 ylim(0,500)+
 labs(x = "x_label", y = "y_label")

p +my_theme()

works fine, but if I skip

my_data$year <- as.factor(my_data$year)

it doesn't work, I get one big fat violin for all years. Why?

TIA


Solution

  • PS: this discussion would better fit Cross Validated, as it's more of an statistics than coding question.

    I'm not 100% sure, but here's my explanation: the violin plot shows the density for a set of data, you can divide your data into groups so that you can plot one violin for each part of your data. But if the metric you're using to divide groups (x axis) is a continuous, you're going to have infinite groupings (one group for the values at 0, one for 0.1, one for 0.01, etc.), so in the end you actually can't divide your data, and ggplot probably ignores the x variable and makes one violin for all your data.