Search code examples
rdplyrpipebind-rows

How do I summarize data and bind new rows to an existing data frame?


Edited to give reprex in response to comments.

I'm reading in several years of public temperature data for California's 58 counties. I'd like to create a summary, the daily statewide average, and put those means in new rows on top of the data frame with the county data in a single, piped step.

I now do this in three steps: (1) reading the county data, (2) creating the means separately, and (3) rowing binding the newly-created means to the data.

Here's a reprex:

#### Reprex ####
library(tidyverse)

df1 <-
  data.frame(
    name = toupper(c(rep(letters[1:5], each=5))),
    x = as.character(c(rnorm(25, 55, 10)))
  )

df2 <- df1 |>
  group_by(name) |>
  mutate(x = mean(as.numeric(x), narm = TRUE)) |> 
  ungroup() |> 
  select(name, x) |>
  unique() |>
  mutate(name = "Z")
  
df <- rbind(df1, df2) 

Here is what I've tried so far, to no avail. Both throw the error message: Error in UseMethod("summarise") : no applicable method for 'summarise' applied to an object of class "c('double', 'numeric')":

#Test 1
df <- 
  data.frame(
    name = toupper(c(rep(letters[1:5], each=5))),
    x = as.character(c(rnorm(25, 55, 10)))
  ) |> 
  group_by(name) |> 
  select(name, x) |>
  do(bind_rows(., data.frame(name = "Z", 
                             mutate(x = mean(as.numeric(.$x), narm = TRUE))))) |>
  ungroup()
  
#Test 2
df <- 
  df <- 
  data.frame(
    name = toupper(c(rep(letters[1:5], each=5))),
    x = as.character(c(rnorm(25, 55, 10)))
  ) |> 
  group_by(name) |> 
  select(name, x) |>
  do(bind_rows(., data.frame(name = "Z", 
                             mutate(x = summarize(mean(as.numeric(.$x), narm = TRUE)))))) |>
  ungroup()

Any help is much appreciated.


Solution

  • The base R pipe doesn't let you use the object it's piping more than once--and twice is needed here, once to append to and once to get the means--but you can work around that by piping into an anonymous function, like this. (Note that I decreased your data size to 3 groups of 3 to make it easier to see and set a seed so the random number generation is fully reproducible.)

    library(dplyr)
    set.seed(47)
    df <- 
      data.frame(
        name = toupper(c(rep(letters[1:3], each=3))),
        x = as.character(c(rnorm(9, 55, 10)))
      ) |>
      mutate(x = as.numeric(x)) |>
      (
        \(dd) bind_rows(dd, summarize(dd, x = mean(x), .by = name))
      )()
    df
    #    name        x
    # 1     A 74.94696
    # 2     A 62.11143
    # 3     A 56.85405
    # 4     B 52.18235
    # 5     B 56.08776
    # 6     B 44.14263
    # 7     C 45.14518
    # 8     C 55.15131
    # 9     C 52.47954
    # 10    A 64.63748
    # 11    B 50.80424
    # 12    C 50.92534
    

    I don't like this much, stylistically I'd do it in two steps, 1 to read and clean the data, 2 to calculate and append. The base R pipe's placeholder _ requires a named argument, which bind_rows doesn't have, so we still need an anonymous function, but I still prefer this way:

    ## step 1
    df <- 
      data.frame(
        name = toupper(c(rep(letters[1:3], each=3))),
        x = as.character(c(rnorm(9, 55, 10)))
      ) |>
      mutate(x = as.numeric(x))
     
    ## step 2
    df = df |>
      summarize(x = mean(x), .by = name) |> 
      (\(x) bind_rows(df, x))()
    

    If you don't mind the magrittr pipe, you can simplify Step 2 to this:

    ## alternate step 2
    df = df |>
      summarize(x = mean(x), .by = name) %>%
      bind_rows(df, .)