Search code examples
rlistdataframedplyrsummarize

R list of data.frames, mean summarise grouping by common factor variable


I have a list of data.frames, with 3 variables each: date, value, key.

I'd like to group each data frame by the key factor, and compute row-wise mean (ie: by date).

Here's a toy example:

set.seed(123)

dates <- seq(from = as.Date("2018-06-18"), to = as.Date("2018-06-22"), length.out = 5)
fake_list <- list(d1 = data.frame(date = dates, value = rnorm(5), key = "aaa"),
              d2 = data.frame(date = dates, value = rnorm(5), key = "bbb"),
              d3 = data.frame(date = dates, value = rnorm(5), key = "aaa"),
              d4 = data.frame(date = dates, value = rnorm(5), key = "bbb"))

The result should be something along the line of:

fake_result <- list(d_aaa = data.frame(date = dates, mean_value = c(0.3318031, 0.06481816, 0.9797399, 0.09059554, -0.2132767)),
                d_bbb = data.frame(date = dates, mean_value = c(1.750989, 0.4793833, -1.615839, 0.0072515, -0.4592267)))

Basically I need to get the mean value by date, aggregating first by the key factor.

Hope this is clear enough, thanks in advance.

Also, dates could be of different length inside the list (data frames with different nrows()), this gives another challenge to the problem..


Solution

  • Here is one option

    library(tidyverse)
    fake_list %>% 
      bind_rows() %>% 
      group_by(key, date) %>% 
      summarise(value = mean(value)) %>%
      ungroup() %>% 
    # as.data.frame() %>%
      split(., .$key) %>% 
      map(., select, -key)
    #$aaa
    # A tibble: 5 x 2
    #  date         value
    #  <date>       <dbl>
    #1 2018-06-18  0.332 
    #2 2018-06-19  0.0648
    #3 2018-06-20  0.980 
    #4 2018-06-21  0.0906
    #5 2018-06-22 -0.213 
    #
    #$bbb
    # A tibble: 5 x 2
    #  date          value
    #  <date>        <dbl>
    #1 2018-06-18  1.75   
    #2 2018-06-19  0.479  
    #3 2018-06-20 -1.62   
    #4 2018-06-21  0.00725
    #5 2018-06-22 -0.459
    

    The idea is to create one data.frame / tibble, do the group_by and summarise operations before we split the data such that we end up with a list.