I'm attempting to write a function that uses dplyr verbs and that takes an "arrow open dataset" as the first argument, and a column in that dataset as the second argument. Since I would like to pass the column as a string (necessary for the context of my actual task I'm working on, i.e. Shiny), I'm using the syntax .data[[.column]]. Below is an image of the error I'm getting and some code to reproduce said error. Any help or insight is appreciated.
# install.packages(c("dplyr", "ggplot2", "arrow"))
library(dplyr)
arrow::write_parquet(x = ggplot2::mpg, sink = "sample_data.parquet")
dat <- arrow::open_dataset("sample_data.parquet")
glimpse(dat)
get_metric <- function(.data, .metric) {
.data %>%
group_by(manufacturer, cyl) %>%
summarize(
new_col = sum(.data[[.metric]], na.rm = T)
) %>%
ungroup()
}
get_metric(dat, "cty") %>% collect()
In this code I collect before the tidy eval stuff so its just essentially regular dplyr code. It runs, but is a slower than code that I've successfully gotten to run before extracting stuff into said function.
get_metric2 <- function(.data, .metric) {
.data %>%
collect() %>%
group_by(manufacturer, cyl) %>%
summarize(
new_col = sum(.data[[.metric]], na.rm = T)
) %>%
ungroup()
}
get_metric2(dat, "cty")
Use the !!
nomenclature.
arrow::write_parquet(x = ggplot2::mpg, sink = "sample_data.parquet")
dat <- arrow::open_dataset("sample_data.parquet")
get_metric <- function(.data, .metric) {
.metric <- rlang::sym(.metric)
.data %>%
group_by(manufacturer, cyl) %>%
summarize(
new_col = sum(!!.metric, na.rm = T)
) %>%
ungroup()
}
get_metric(dat, "cty") %>%
collect()
# # A tibble: 32 × 3
# manufacturer cyl new_col
# <chr> <int> <int>
# 1 audi 4 153
# 2 audi 6 148
# 3 audi 8 16
# 4 chevrolet 8 191
# 5 chevrolet 4 41
# 6 chevrolet 6 53
# 7 dodge 4 18
# 8 dodge 6 225
# 9 dodge 8 243
# 10 ford 8 197
# # ℹ 22 more rows
# # ℹ Use `print(n = ...)` to see more rows