Search code examples
rggplot2plotposixct

How can I work with stat_density and a timeseries (Posixct on x axis)?


Based on this example:

#example from https://ggplot2.tidyverse.org/reference/geom_tile.html
cars <- ggplot(mtcars, aes(mpg,factor(cyl)))
cars + stat_density(aes(fill = after_stat(density)), geom = "raster", position = "identity")

I wanted to create a plot with the density plotted vertically per hour of my dataset. The original dataset is very long. I also want to display the single data points and a mean as a line.

Here is a simplified basic version of the code:

#reproducable example for density plot
library(reshape2)
library(ggplot2)
library(scales)

startdate <- as.POSIXct("2020-01-01 01:00", tz="UTC")
enddate <- as.POSIXct("2020-01-01 05:00", tz="UTC")

#dataframe
df <- data.frame(x = seq.POSIXt(startdate, enddate, "hour"),
                 y1 = c(1,2,3,4,5),
                 y2 = c(2,4,6,8,10),
                 y3 = c(3,6,9,12,15))
df$mean <- rowMeans(df[,-1])
df_melt <- melt(df, id.vars = 1, measure.vars = c(2,3,4))

#plot
g1 <- ggplot(data = df_melt, aes(factor(x), value)) +
  stat_density(aes(fill = after_stat(ndensity)),
               geom = "raster", position = "identity", orientation = "y") +
  geom_point()
g1

This works, but the original dataset has so many hours that the labeling of the x axis is not nice. I also want to determine how the dateformat of the labels should look like and the limits of the plot. Before working with stat_density, I used to do that with scale_x_datetime. But for the density plot I have to use factor(x) instead of the original x, which is PosixcT. So the following scaling produces an error because x is a factor and not a date, obviously:

#scale x datetime (does not work)
g1 <- g1 + scale_x_datetime(labels = date_format("%b/%d", tz="UTC"),
                   limits = c(startdate, enddate),
                   breaks = function(x)
                     seq.POSIXt(from = startdate, to = enddate, by = "2 days"),
                   date_minor_breaks = "12 hours",
                   expand = c(0,0))
g1

I managed to scale_x_discrete but this makes it hard to determine the label format and limits with the bigger dataset:

#scale x discrete
g1 <- g1 + scale_x_discrete(limits = c(as.character(df$x)),
                            breaks = as.character(df$x)[c(2,4)])
g1

The next problem with factors is then that I cannot add the mean of every hour as geom_line as every factor consists of 1 observation only.

#plot mean
g1 + geom_point(data = df, aes(factor(x), mean), col = "red")
g1 + geom_line(data = df, aes(factor(x), mean), col = "red")

So, is there a way to produce the desired plot with density per hour, overplotted points and overplotted mean line? And I want to edit the x labels and limits as comfortably as possible. Maybe there is a way to use x instead of factor(x)...


Solution

  • I think the solution might be as simple as dropping the as.factor() and setting an explicit group in the density. Does the following work for your real case?

    library(reshape2)
    library(ggplot2)
    library(scales)
    #> Warning: package 'scales' was built under R version 4.0.3
    
    startdate <- as.POSIXct("2020-01-01 01:00", tz="UTC")
    enddate <- as.POSIXct("2020-01-01 05:00", tz="UTC")
    
    #dataframe
    df <- data.frame(x = seq.POSIXt(startdate, enddate, "hour"),
                     y1 = c(1,2,3,4,5),
                     y2 = c(2,4,6,8,10),
                     y3 = c(3,6,9,12,15))
    df$mean <- rowMeans(df[,-1])
    df_melt <- melt(df, id.vars = 1, measure.vars = c(2,3,4))
    
    #plot
    ggplot(data = df_melt, aes(x, value)) +
      stat_density(aes(fill = after_stat(ndensity),
                       group = x),
                   geom = "raster", position = "identity", orientation = "y") +
      geom_point()
    

    Created on 2021-01-29 by the reprex package (v0.3.0)