I'm relatively new to R and could really use some help with some pretty basic ggplot2
work.
I'm trying to visualize total number of submissions on a graph, showing the overall total in a line graph and the daily total in a histogram (or bar graph) on top of it. I'm not sure how to add breaks or bins to the histogram so that it takes the submission datetime column and makes each bar the daily total.
I tried adding a column that converts the datetime into just date and plots based on that, but I'd really like the line graph to include the time.
Here's what I have so far:
df <- df %>%
mutate(datetime = lubridate::mdy_hm(datetime))%>%
mutate(date = lubridate::as_date(datetime))
#sort by datetime
df <- df %>%
arrange(datetime)
#add total number of submissions
df <- df %>%
mutate(total = row_number())
#ggplot
line_plus_histo <- df%>%
ggplot() +
geom_histogram(data = df, aes(x=datetime)) +
geom_line(data = df, aes(x=datetime, y=total), col = "red") +
stat_bin(data = df, aes(x=date), geom = "bar") +
labs(
title="Submissions by Day",
x="Date",
y="Submissions",
legend=NULL)
line_plus_histo
As you can see, I'm also calculating the total number of submissions by sorting by time and then adding a column with the row number. So if you can help me use a better method I'd really appreciate it.
Please, find below the line plus histogram of time v. submissions:
You can extend your data manipulation by:
df <- df |>
mutate(datetime = lubridate::mdy_hm(datetime)) |>
arrange(datetime) |>
mutate(midday = as_datetime(floor_date(as_date(datetime), unit = "day") + 0.5)) |>
mutate(totals = row_number()) |>
group_by(midday) |>
mutate(N = n())|>
ungroup()
then use midday
for bars and datetime
for line:
df%>%
ggplot() +
geom_bar(data = df, aes(x = midday)) +
geom_line(data = df, aes(x=datetime, y=totals), col = "red") +
labs(
title="Submissions by Day",
x="Date",
y="Submissions",
legend=NULL)
PS. Sorry for Polish locales on X axis.
PS2. With geom_bar
it looks much better
Created on 2022-02-03 by the reprex package (v2.0.1)