I think curly curly is a good successor to bang bang. Nevertheless, I am still struggling to understand the tidyverse NSE.
Let's say I want to make a simple function that runs on a dataframe, takes multiple columns, reshapes them in long format and makes factors out of the resulting reshaped columns. I would also want the capability of defining the factor levels, or by default leaving them in the order the selected columns were before the reshaping process.
I write the same function using bang bang and curly.
library(tidyverse)
df_test <- data.frame(
id = c("id1", "id2"),
item1 = c(0, 4),
item2 = c(3, 2),
item3 = c(1, 4),
item4 = c(3, 4),
item5 = c(1, NA)
)
# Bang bang way
vars_factor_to_long <- function(data, vars) {
vars_enq <- rlang::enquo(vars)
vars_name <- unname(tidyselect::vars_select(unique(names(data)), !!vars_enq))
data <-
data %>%
tidyr::pivot_longer(cols = !!vars_enq, names_to = "item", values_to = "value") %>%
dplyr::mutate(item = factor(item, levels = vars_name))
data
}
vars_factor_to_long(df_test, item1:item5)
#> # A tibble: 10 x 3
#> id item value
#> <fct> <fct> <dbl>
#> 1 id1 item1 0
#> 2 id1 item2 3
#> 3 id1 item3 1
#> 4 id1 item4 3
#> 5 id1 item5 1
#> 6 id2 item1 4
#> 7 id2 item2 2
#> 8 id2 item3 4
#> 9 id2 item4 4
#> 10 id2 item5 NA
# Curly curly way works the same, but doesn't need the enquo
vars_factor_to_long2 <- function(data, vars) {
vars_name <- unname(tidyselect::vars_select(unique(names(data)), {{vars}}))
data <-
data %>%
tidyr::pivot_longer(cols = {{vars}}, names_to = "item", values_to = "value") %>%
dplyr::mutate(item = factor(item, levels = vars_name))
data
}
vars_factor_to_long2(df_test, item1:item5)
#> # A tibble: 10 x 3
#> id item value
#> <fct> <fct> <dbl>
#> 1 id1 item1 0
#> 2 id1 item2 3
#> 3 id1 item3 1
#> 4 id1 item4 3
#> 5 id1 item5 1
#> 6 id2 item1 4
#> 7 id2 item2 2
#> 8 id2 item3 4
#> 9 id2 item4 4
#> 10 id2 item5 NA
Created on 2021-02-05 by the reprex package (v0.3.0)
This works well, but I found out that the data-variables can be tunneled inside strings with curly curly by using a syntax similar to glue
. For example:
# Curly curly - tunneling data-variable inside strings with glue-like syntax
mean_by <- function(data, by, vars) {
data %>%
group_by({{ by }}) %>%
summarise("{{ vars }}" := mean({{ vars }}, na.rm = TRUE))
}
mean_by(df_test, id, item1)
#> # A tibble: 2 x 2
#> id item1
#> <fct> <dbl>
#> 1 id1 0
#> 2 id2 4
mean_by(df_test, id, item1:item2)
#> # A tibble: 2 x 2
#> id `item1:item2`
#> <fct> <dbl>
#> 1 id1 1.5
#> 2 id2 3
Created on 2021-02-05 by the reprex package (v0.3.0)
Is there a way I can tunnel the data-variable names (as in the second example) to be used as factor levels in the functions from the first example? I suspect there would be a problem with this sort of tunneling if there are multiple columns provided, but still, how can I tunnel at least one column name to the RHS?
Any discussion on this new rlang topic would help me understand more. Thank you.
You need to tunnel the user selection and then get the names in any case, so I think your method is good. I would simplify it a bit by using dplyr instead of the low level tidyselect though:
vars_factor_to_long2 <- function(data, vars) {
vars <- names(dplyr::select(data, {{ vars }}))
data %>%
tidyr::pivot_longer(
cols = all_of(vars),
names_to = "item",
values_to = "value"
) %>%
dplyr::mutate(
item = factor(item, levels = vars)
)
}