Search code examples
rggplot2time-seriesx-axis

ggplot time series: messed up x axis for data with missing values


I am creating time series plot for the following data:

#  Creating data set
year <-  c(rep(2018,4), rep(2019,4), rep(2020,4))
month_1 <-  c(2, 3, 7,  8, 6, 10, 11, 12,  5,  7,  8, 12)
avg_dlt_calc <- c(10, 20, 11, 21, 13,  7, 10, 15,  9, 14, 16, 32)
data_to_plot <- data.frame(cbind(year,month_1,avg_dlt_calc ))



ggplot(data_to_plot, aes(x = month_1)) +
  geom_line(aes(y = avg_dlt_calc), size = 0.5) +
  scale_x_discrete(name = "months", limits = data_with_avg$month_1) +
  facet_grid(~year, scales = "free")

I am ok with the plot itself, but x-axis labels are messed up:

enter image description here

How I can fix it?

It is ok not to have labels for missing months (for example, for 2018 it will be only 2,3,7,8 - so it will be clear, that there is data only for those months).


Solution

  • A remedy is to coerce month_1 to a factor and group the observations by year like so:

    ggplot(data_to_plot, aes(x = as.factor(month_1), y = avg_dlt_calc, group = year)) +
      geom_line(size = 0.5) +
      scale_x_discrete(name = "months") +
      facet_grid(~year, scales = "free")
    

    Note that I've moved y = avg_dlt_calc inside aes() in ggplot() which is more idiomatic than your approach. You may use the breaks argument in scale_x_discrete() to set breaks manually, see ?scale_x_discrete.

    enter image description here

    I think a fixed x-axis and adding points is more suitable for conveying the information that data is only available for some periods:

    ggplot(data_to_plot, aes(x = as.factor(month_1), y = avg_dlt_calc, group = year)) +
      geom_line(size = 0.5) +
      geom_point() +
      scale_x_discrete(name = "months") +
      facet_grid(~year, scales = "free_y")
    

    enter image description here