Edited to give reprex
in response to comments.
I'm reading in several years of public temperature data for California's 58 counties. I'd like to create a summary, the daily statewide average, and put those means in new rows on top of the data frame with the county data in a single, piped step.
I now do this in three steps: (1) reading the county data, (2) creating the means separately, and (3) rowing binding the newly-created means to the data.
Here's a reprex:
#### Reprex ####
library(tidyverse)
df1 <-
data.frame(
name = toupper(c(rep(letters[1:5], each=5))),
x = as.character(c(rnorm(25, 55, 10)))
)
df2 <- df1 |>
group_by(name) |>
mutate(x = mean(as.numeric(x), narm = TRUE)) |>
ungroup() |>
select(name, x) |>
unique() |>
mutate(name = "Z")
df <- rbind(df1, df2)
Here is what I've tried so far, to no avail. Both throw the error message: Error in UseMethod("summarise") : no applicable method for 'summarise' applied to an object of class "c('double', 'numeric')"
:
#Test 1
df <-
data.frame(
name = toupper(c(rep(letters[1:5], each=5))),
x = as.character(c(rnorm(25, 55, 10)))
) |>
group_by(name) |>
select(name, x) |>
do(bind_rows(., data.frame(name = "Z",
mutate(x = mean(as.numeric(.$x), narm = TRUE))))) |>
ungroup()
#Test 2
df <-
df <-
data.frame(
name = toupper(c(rep(letters[1:5], each=5))),
x = as.character(c(rnorm(25, 55, 10)))
) |>
group_by(name) |>
select(name, x) |>
do(bind_rows(., data.frame(name = "Z",
mutate(x = summarize(mean(as.numeric(.$x), narm = TRUE)))))) |>
ungroup()
Any help is much appreciated.
The base R pipe doesn't let you use the object it's piping more than once--and twice is needed here, once to append to and once to get the means--but you can work around that by piping into an anonymous function, like this. (Note that I decreased your data size to 3 groups of 3 to make it easier to see and set a seed so the random number generation is fully reproducible.)
library(dplyr)
set.seed(47)
df <-
data.frame(
name = toupper(c(rep(letters[1:3], each=3))),
x = as.character(c(rnorm(9, 55, 10)))
) |>
mutate(x = as.numeric(x)) |>
(
\(dd) bind_rows(dd, summarize(dd, x = mean(x), .by = name))
)()
df
# name x
# 1 A 74.94696
# 2 A 62.11143
# 3 A 56.85405
# 4 B 52.18235
# 5 B 56.08776
# 6 B 44.14263
# 7 C 45.14518
# 8 C 55.15131
# 9 C 52.47954
# 10 A 64.63748
# 11 B 50.80424
# 12 C 50.92534
I don't like this much, stylistically I'd do it in two steps, 1 to read and clean the data, 2 to calculate and append. The base R pipe's placeholder _
requires a named argument, which bind_rows
doesn't have, so we still need an anonymous function, but I still prefer this way:
## step 1
df <-
data.frame(
name = toupper(c(rep(letters[1:3], each=3))),
x = as.character(c(rnorm(9, 55, 10)))
) |>
mutate(x = as.numeric(x))
## step 2
df = df |>
summarize(x = mean(x), .by = name) |>
(\(x) bind_rows(df, x))()
If you don't mind the magrittr
pipe, you can simplify Step 2 to this:
## alternate step 2
df = df |>
summarize(x = mean(x), .by = name) %>%
bind_rows(df, .)