Search code examples
rdplyrmappingpurrr

How to mutate to input the mean as a column after using nest?


I'm trying to learn how to use nest(), and I'm trying to nest by once of 3 time periods participants could be in and I want to add two columns. The first column is the overall mean, which I have figured out. Then, I want to nest by the time variable and create 3 datasets (which I have figured out) and then compute the group mean. I read that you should create a function (here, section 6.3.1), but my function keeps failing. How would I do this?

Also, please use nest or nest_by in the solution. I know I could use group_by(), like someone else did here, but in my actual data, I need these to be 3 separate datasets due to other computations that I need to do.

#Here's my setup and sample data
library(dplyr)
library(purrr)
library(tidyr)

set.seed(1414)
test <- tibble(id = c(1:100),
               condition = c(rep(c("pre", "post"), 50)),
               time = c(case_when(condition == "pre" ~ 0,
                                  condition == "post" ~ sample(c(1, 2), size = c(100), replace = TRUE))),
               score = case_when(time == 0 ~ 1,
                                 time == 1 ~ 10,
                                 time == 2 ~ 100))


#Here's what I tried

#Nesting the data (works)
nested_test <- test %>%
  unite(col = "all_combos", c(condition, time)) %>%
  mutate(score2 = mean(score)) %>%
  nest_by(all_combos)

#Make mean function and map it (doesn't work)

my_mean <- function(data) {
  mean(score, na.rm = T)
}

nested_test %>%
  mutate(score3 = map(data, my_mean))

Solution

  • We may need to ungroup as there is rowwise attribute and then loop over the data with map and create the column with mutate on the nested data

    library(dplyr)
    library(purrr)
    nested_test_new <- nested_test %>%
      ungroup %>%
       mutate(data = map(data, ~ .x %>%
        mutate(score3 = mean(score, na.rm = TRUE))))
    

    -output

    nested_test_new
    # A tibble: 3 × 2
      all_combos data             
      <chr>      <list>           
    1 post_1     <tibble [19 × 4]>
    2 post_2     <tibble [31 × 4]>
    3 pre_0      <tibble [50 × 4]>
    > nested_test_new$data
    [[1]]
    # A tibble: 19 × 4
          id score score2 score3
       <int> <dbl>  <dbl>  <dbl>
     1     2    10   33.4     10
     2     4    10   33.4     10
     3    14    10   33.4     10
     4    16    10   33.4     10
     5    18    10   33.4     10
     6    28    10   33.4     10
     7    30    10   33.4     10
     8    32    10   33.4     10
     9    38    10   33.4     10
    10    44    10   33.4     10
    11    48    10   33.4     10
    12    60    10   33.4     10
    13    64    10   33.4     10
    14    78    10   33.4     10
    15    80    10   33.4     10
    16    86    10   33.4     10
    17    92    10   33.4     10
    18    96    10   33.4     10
    19   100    10   33.4     10
    
    [[2]]
    # A tibble: 31 × 4
          id score score2 score3
       <int> <dbl>  <dbl>  <dbl>
     1     6   100   33.4    100
     2     8   100   33.4    100
     3    10   100   33.4    100
     4    12   100   33.4    100
    ...
    

    Or another option is nest_mutate from nplyr

    library(nplyr)
    test %>%
      unite(col = "all_combos", c(condition, time)) %>%
      mutate(score2 = mean(score)) %>%
      nest(data = -all_combos) %>%
      nest_mutate(data, score3 = mean(score, na.rm = TRUE))
    

    -output

    # A tibble: 3 × 2
      all_combos data             
      <chr>      <list>           
    1 pre_0      <tibble [50 × 4]>
    2 post_1     <tibble [19 × 4]>
    3 post_2     <tibble [31 × 4]>