Search code examples
rsummary

Create a summary() table by date in R


This must be simple, but I can't find guidance how to accomplish it. I've got a very simple (sample below), with a date variable and a temperature variable. The output I want is a data frame by date (i.e., one row per date) with a column for each of the outputs of the summary function:

> summary(temp$coretemp)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  103.0   116.0   118.0   117.5   119.0   128.0 

Here's the input dataframe (much truncated):

temp <- structure(list(coretemp = c(128L, 128L, 128L, 119L, 116L, 116L, 
116L, 118L, 118L, 120L, 120L, 118L, 113L, 113L, 113L, 118L, 120L, 
123L, 119L, 119L, 117L, 116L, 121L, 121L, 118L, 118L, 118L, 118L, 
119L, 119L, 103L, 116L, 116L, 116L, 120L, 118L, 119L, 119L, 116L, 
115L, 115L, 117L, 117L, 117L, 107L, 114L, 114L, 113L, 118L, 118L
), date = structure(c(19875, 19875, 19875, 19875, 19875, 19875, 
19875, 19875, 19875, 19876, 19876, 19876, 19876, 19876, 19876, 
19876, 19876, 19876, 19876, 19876, 19877, 19877, 19877, 19877, 
19877, 19877, 19877, 19877, 19877, 19877, 19878, 19878, 19878, 
19878, 19878, 19878, 19878, 19878, 19878, 19878, 19879, 19879, 
19879, 19879, 19879, 19879, 19879, 19879, 19879, 19879), class = "Date")), row.names = c(NA, 
-50L), class = "data.frame")

It looks from my googling as though summarise_by_time used to do what I'm trying to do (see https://search.r-project.org/CRAN/refmans/timetk/html/summarise_by_time.html), but it does not appear to be available in modern dplyr and I can't find what's replaced it.

I've also tried describe_by thus:

describe.by(temp, group = temp$date)

but it returns summary of both the date column AND the coretemp column which is unhelpful.


Solution

  • Turned out to be easier than I thought. Here's my dplyr solution:

    TempSum <- temp %>%
      group_by(date) %>%
      summarise(low = min(coretemp),
                hi = max(coretemp),
                mean = mean(coretemp),
                median = median(coretemp),
                IQR = IQR(coretemp))
    

    Which resulted in the following output:

    # A tibble: 5 × 6
      date       low    hi  mean median   IQR
      <chr>    <int> <int> <dbl>  <dbl> <dbl>
    1 6/1/2024   116   128  121.    118 12   
    2 6/2/2024   113   123  118.    119  4.5 
    3 6/3/2024   116   121  118.    118  1   
    4 6/4/2024   103   120  116.    116  2.75
    5 6/5/2024   107   118  115     116  3