Search code examples
rfor-loopggplot2groupingfrequency

Count mean subgroup occurrence within subgroup


I have the following dataframe:

date        hour_of_day    distance  weather_of_the_day
2017-06-13   6             10.32     1
2017-06-13   8             2.32      1
2017-06-14   10            4.21      2
2017-06-15   7             4.56      4
2017-06-15   7             8.92      4
2017-06-16   22            2.11      3


structure(list(startdat = structure(c(17272, 17272, 17272, 17272,17272, 17272, 17272, 17272, 17272, 17272, 17272, 17272, 17272,17272, 17272, 17272, 17273, 17273, 17273, 17273), class = "Date"),    hOfDay = c(22L, 16L, 12L, 13L, 18L, 19L, 19L, 16L, 22L, 10L, 
10L, 16L, 11L, 20L, 9L, 15L, 18L, 12L, 16L, 18L), tripDKM = c(0.2, 
6.4, 3.4, 0.8, 2.4, 2.2, 2.2, 7.3, 2.6, 3.8, 7.5, 5.8, 3.7, 
2.1, 2.6, 5.2, 2.9, 1.7, 3.2, 3.1), totDMIN = c(1.85, 27.4, 
8.2, 4.21666666666667, 15.65, 8.91666666666667, 11.5666666666667, 
29.5166666666667, 7.01666666666667, 12.2166666666667, 15.8833333333333, 
19.5666666666667, 21.7166666666667, 8.66666666666667, 11.2333333333333, 
13.4, 7.58333333333333, 10.6166666666667, 6.76666666666667, 
17.7), weather_day = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L), .Label = c("1", 
"2", "3", "4"), class = "factor")), row.names = c(1L, 2L,3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 15L, 16L, 17L, 19L, 20L, 21L, 22L), class = "data.frame")

My final goal is to have a line ggplot, where the x-axis shows the hour_of_day, the y-axis stands for the mean number of occurrences. Eventually the lines should represent the 4 weather conditions. So one line ought to represent weather_of_the_day=1, and the y axis shows how often, on average weather_day=1 has an occurrence with hour_of_day=6 (as an example) and so on for 7, 8, etc.. What I want, are not only the number of occurrences, but the average number of occurrences.

I've been struggling for 2 days with this. I've tried different approaches, with for loops and subgrouping. But non of them brought a usable solution. Thank you very much for your help in advance!


Solution

  • Your posted data set is a little small but this is what I would suggest. It only makes sense with more data points though. df is the set you posted.

    library(dplyr)
    library(ggplot2)
    
    df_plot <- df %>% 
      mutate(weather_of_the_day = factor(weather_of_the_day)) %>% 
      group_by(hour_of_day, weather_of_the_day) %>% 
      summarize(occurances = n())
    
     ggplot(data = df_plot, 
            aes(x = hour_of_day, 
                y = occurances, 
                group = weather_of_the_day, 
                color = weather_of_the_day)) +
      geom_line()+
      geom_point()