Search code examples
rggplot2data-visualizationhistogram

Plotting a line graph by datetime with a histogram/bar graph by date


I'm relatively new to R and could really use some help with some pretty basic ggplot2 work.

I'm trying to visualize total number of submissions on a graph, showing the overall total in a line graph and the daily total in a histogram (or bar graph) on top of it. I'm not sure how to add breaks or bins to the histogram so that it takes the submission datetime column and makes each bar the daily total.

I tried adding a column that converts the datetime into just date and plots based on that, but I'd really like the line graph to include the time.

Here's what I have so far:

df <- df %>%
mutate(datetime = lubridate::mdy_hm(datetime))%>%
mutate(date = lubridate::as_date(datetime))

#sort by datetime 
df <- df %>%
  arrange(datetime)

#add total number of submissions
df <- df %>%
  mutate(total = row_number())

#ggplot
line_plus_histo <- df%>%
  ggplot() +
  geom_histogram(data = df, aes(x=datetime)) +
  geom_line(data = df, aes(x=datetime, y=total), col = "red") +
  stat_bin(data = df, aes(x=date), geom = "bar") +
  labs(
    title="Submissions by Day", 
    x="Date",
    y="Submissions",
    legend=NULL)

line_plus_histo

As you can see, I'm also calculating the total number of submissions by sorting by time and then adding a column with the row number. So if you can help me use a better method I'd really appreciate it.

Please, find below the line plus histogram of time v. submissions:

enter image description here

Here's the pastebin link with my data


Solution

  • You can extend your data manipulation by:

    df <- df |>
      mutate(datetime = lubridate::mdy_hm(datetime)) |>
      arrange(datetime) |>
      mutate(midday = as_datetime(floor_date(as_date(datetime), unit = "day") + 0.5)) |>
      mutate(totals = row_number()) |>
      group_by(midday) |>
      mutate(N = n())|>
      ungroup()
    

    then use midday for bars and datetime for line:

    df%>%
      ggplot() +
      geom_bar(data = df, aes(x = midday)) +
      geom_line(data = df, aes(x=datetime, y=totals), col = "red") +
      labs(
        title="Submissions by Day", 
        x="Date",
        y="Submissions",
        legend=NULL)
    

    PS. Sorry for Polish locales on X axis.

    PS2. With geom_bar it looks much better

    Created on 2022-02-03 by the reprex package (v2.0.1)