Based on this example:
#example from https://ggplot2.tidyverse.org/reference/geom_tile.html
cars <- ggplot(mtcars, aes(mpg,factor(cyl)))
cars + stat_density(aes(fill = after_stat(density)), geom = "raster", position = "identity")
I wanted to create a plot with the density plotted vertically per hour of my dataset. The original dataset is very long. I also want to display the single data points and a mean as a line.
Here is a simplified basic version of the code:
#reproducable example for density plot
library(reshape2)
library(ggplot2)
library(scales)
startdate <- as.POSIXct("2020-01-01 01:00", tz="UTC")
enddate <- as.POSIXct("2020-01-01 05:00", tz="UTC")
#dataframe
df <- data.frame(x = seq.POSIXt(startdate, enddate, "hour"),
y1 = c(1,2,3,4,5),
y2 = c(2,4,6,8,10),
y3 = c(3,6,9,12,15))
df$mean <- rowMeans(df[,-1])
df_melt <- melt(df, id.vars = 1, measure.vars = c(2,3,4))
#plot
g1 <- ggplot(data = df_melt, aes(factor(x), value)) +
stat_density(aes(fill = after_stat(ndensity)),
geom = "raster", position = "identity", orientation = "y") +
geom_point()
g1
This works, but the original dataset has so many hours that the labeling of the x axis is not nice. I also want to determine how the dateformat of the labels should look like and the limits of the plot. Before working with stat_density
, I used to do that with scale_x_datetime
. But for the density plot I have to use factor(x)
instead of the original x
, which is PosixcT. So the following scaling produces an error because x is a factor and not a date, obviously:
#scale x datetime (does not work)
g1 <- g1 + scale_x_datetime(labels = date_format("%b/%d", tz="UTC"),
limits = c(startdate, enddate),
breaks = function(x)
seq.POSIXt(from = startdate, to = enddate, by = "2 days"),
date_minor_breaks = "12 hours",
expand = c(0,0))
g1
I managed to scale_x_discrete
but this makes it hard to determine the label format and limits with the bigger dataset:
#scale x discrete
g1 <- g1 + scale_x_discrete(limits = c(as.character(df$x)),
breaks = as.character(df$x)[c(2,4)])
g1
The next problem with factors is then that I cannot add the mean of every hour as geom_line as every factor consists of 1 observation only.
#plot mean
g1 + geom_point(data = df, aes(factor(x), mean), col = "red")
g1 + geom_line(data = df, aes(factor(x), mean), col = "red")
So, is there a way to produce the desired plot with density per hour, overplotted points and overplotted mean line? And I want to edit the x labels and limits as comfortably as possible. Maybe there is a way to use x instead of factor(x)...
I think the solution might be as simple as dropping the as.factor()
and setting an explicit group in the density. Does the following work for your real case?
library(reshape2)
library(ggplot2)
library(scales)
#> Warning: package 'scales' was built under R version 4.0.3
startdate <- as.POSIXct("2020-01-01 01:00", tz="UTC")
enddate <- as.POSIXct("2020-01-01 05:00", tz="UTC")
#dataframe
df <- data.frame(x = seq.POSIXt(startdate, enddate, "hour"),
y1 = c(1,2,3,4,5),
y2 = c(2,4,6,8,10),
y3 = c(3,6,9,12,15))
df$mean <- rowMeans(df[,-1])
df_melt <- melt(df, id.vars = 1, measure.vars = c(2,3,4))
#plot
ggplot(data = df_melt, aes(x, value)) +
stat_density(aes(fill = after_stat(ndensity),
group = x),
geom = "raster", position = "identity", orientation = "y") +
geom_point()
Created on 2021-01-29 by the reprex package (v0.3.0)