Search code examples
rdplyrtidyverse

Period (`.`) in dplyr::summarise for referencing table within groups


I have the following table:

# Inputs
require(dplyr)

set.seed(123)

df <- tibble(g = rep(x = c("A", "B", "C"), times = c(3, 5, 7)),
             a = sample(x = 1:100, size = 15), 
             b = sample(x = 1:100, size = 15))

df

# A tibble: 15 × 3
   g         a     b
   <chr> <int> <int>
 1 A        87     8
 2 A        35    51
 3 A        40    74
 4 B        30    50
 5 B        12    98
 6 B        31    86
 7 B        97    76
 8 B        64    84
 9 C        14    46
10 C        71    17
11 C        67    62
12 C        23    92
13 C        79    54
14 C        85    35
15 C        37    79

I also define a function (for this example, very simple) that takes a whole data frame and returns a single numeric value:

# Function
myFUN <- function(x){
  mean(x$a + x$b)
}

What I am looking for is to apply the group_by and summarise functions to get a table with the results for each group.

I thought it could be as simple as doing the following:

# What I got (incorrect results)
df %>%  
  
  group_by(g) %>% 
  
  summarise(res = myFUN(.))

# A tibble: 3 × 2
  g       res
  <chr> <dbl>
1 A      112.
2 B      112.
3 C      112.

But as you can see, all the result (res) values are the same, because . refers to the whole initial table and not to the subset tables withing each group.

I leave an example of the expected result using a loop:

# Expected
out <- list()
for(i in unique(df$g)){
  out[[length(out) + 1]] <- tibble(g = i,
                                   res = df %>% filter(g == i) %>% myFUN)
}

out |> bind_rows()

# A tibble: 3 × 2
  g       res
  <chr> <dbl>
1 A      98.3
2 B     126. 
3 C     109.

Solution (taken from the answer of @Limey)

df %>% 
  group_by(g) %>% 
  
  group_map(.f = function(.x, .y){
    .x %>% 
      
      summarise(res = myFUN(.)) %>% 
      
      mutate(g = .y$g, .before = 1)
  }) %>% 
  
  bind_rows()

Solution

  • I don't get your expected results, but I think this gives you what you want. .x takes the place of your . because group_map expects a function with two arguments.

    df %>% 
      group_by(g) %>% 
      group_map(
        function(.x, .y) {
          .x %>% 
            summarise(res = mean(a + b)) %>% 
            add_column(g = .y$g, .before = 1)
        }
      ) %>% bind_rows()
    # A tibble: 3 × 2
      g       res
      <chr> <dbl>
    1 A      78.7
    2 B     101. 
    3 C     117. 
    

    And, of course, this can be reduced to

    df %>% 
      group_by(g) %>% 
      summarise(res = mean(a + b))
    

    (with identical results) in this case, but I accept you have simplified your real use-case to produce your MWE.