Search code examples
rggplot2timefrequency

R how to ggplot frequency every 2 hours in dataframe


I've following dataset:

time          tta    
08:20:00       1
21:30:00       5
22:00:00       1
22:30:00       1
00:25:00       1
17:00:00       5  

I would like to plot bar chart using ggplot so that the x-axis has every every 2 hours(00:00:00,02:00:00,04:00:00 and so on) and y-axis has frequency for a factor tta (1 and 5).

x-axis should be 00-01,01-02,... so on


Solution

  • I approached this using the xts package, but then found that it does not offer flooring the time. Hence, I conclude lubridate to be more practical here, also because ggplot does not understand xts objects right away. Both packages help you transforming time data in many ways.

    Use xts::align.time or lubridate::floor_date to shift your times to the next/previous full hour/day/etc.

    Either way, you aggregate the data before you pass it to ggplot. You can use sum to sum up tta, or just use length to count the number of occurences, but in the latter case you could also use geom_histogram on the time series only. You can carefully shift the bars in ggplot with position_nudge to represent a period rather than just sitting centered on a point of time. You sould specify scale_x_time(labels = ..., breaks = ...) in the plot.

    Data:

    time <- c(
      "08:20:00",
      "21:30:00",
      "22:00:00",
      "22:30:00",
      "00:25:00",
      "17:00:00"
    )
    time <- as.POSIXct(time, format = "%H:%M:%S")
    tta <- c(1, 5, 1, 1, 1, 5)
    

    Using xts:

    library(xts)
    myxts <- xts(tta, order.by = time)
    myxts_aligned <- align.time(myxts, n = 60*60*2)  # shifts all times to the next full
    # 2 hours
    myxts_agg <- period.apply(myxts_aligned,
                               INDEX = endpoints(myxts, "hours", 2),
                               FUN = sum)  # sums up every two hours
    require(ggplot2)
    ggplot(mapping = aes(x = index(myxts_agg), y = myxts_agg[, 1])) +
      geom_bar(stat = "identity",
               width = 60*60*2,  # one bar to be 2 hours wide
               position = position_nudge(x = -60*60),  # shift one hour to the left
               # so that the bar represents the actual period
               colour = "black") +
      scale_x_time(labels = function(x) strftime(x, "%H:%M"),
                   breaks = index(myxts_agg)) +  # add more breaks manually if you like
      scale_y_continuous()  # to escape the warning of ggplot not knowing
      # how to deal with xts object
    

    Using lubridate:

    require(lubridate)
    require(tidyverse)
    mydf <- data.frame(time = time, tta = tta)
    mydf_agg <-
      mydf %>%
        group_by(time = floor_date(time, "2 hours")) %>%
        summarise(tta_sum = sum(tta), tta_freq = n())
    ggplot(mydf_agg, aes(x = time, y = tta_sum)) +
      geom_bar(stat = "identity",
               width = 60*60*2,  # one bar to be 2 hours wide
               position = position_nudge(x = 60*60),  # shift one hour to the *right*
               # so that the bar represents the actual period
               colour = "black") +
      scale_x_time(labels = function(x) strftime(x, "%H:%M"),
                   breaks = mydf_agg$time)  # add more breaks manually if you like
    

    After all, allmost the same:

    xts and lubridate two hour aggregation