I'm working on building a function that I will manipulate a data frame based on a string. Within the function, I'll build a column name as from the string and use it to manipulate the data frame, something like this:
library(dplyr)
orig_df <- data_frame(
id = 1:3
, amt = c(100, 200, 300)
, anyA = c(T,F,T)
, othercol = c(F,F,T)
)
summarize_my_df_broken <- function(df, my_string) {
my_column <- quo(paste0("any", my_string))
df %>%
filter(!!my_column) %>%
group_by(othercol) %>%
summarize(
n = n()
, total = sum(amt)
) %>%
# I need the original string as new column which is why I can't
# pass in just the column name
mutate(stringid = my_string)
}
summarize_my_df_works <- function(df, my_string) {
my_column <- quo(paste0("any", my_string))
df %>%
group_by(!!my_column, othercol) %>%
summarize(
n = n()
, total = sum(amt)
) %>%
mutate(stringid = my_string)
}
# throws an error:
# Argument 2 filter condition does not evaluate to a logical vector
summarize_my_df_broken(orig_df, "A")
# works just fine
summarize_my_df_works(orig_df, "A")
I understand what the problem is: unquoting the quosure as an argument to filter()
in the broken version is not referencing the actual column anyA.
What I don't understand is why it works in summarize()
, but not in filter()
--why is there a difference?
Right now you are are making quosures of strings, not symbol names. That's not how those are supposed to be used. There's a big difference between quo("hello")
and quo(hello)
. If you want to make a proper symbol name from a string, you need to use rlang::sym
. So a quick fix would be
summarize_my_df_broken <- function(df, my_string) {
my_column <- rlang::sym(paste0("any", my_string))
...
}
If you look more closely I think you'll see the group_by/summarize
isn't actually working the way you expect either (though you just don't get the same error message). These two do not produce the same results
summarize_my_df_works(orig_df, "A")
# `paste0("any", my_string)` othercol n total
# <chr> <lgl> <int> <dbl>
# 1 anyA FALSE 2 300
# 2 anyA TRUE 1 300
orig_df %>%
group_by(anyA, othercol) %>%
summarize(
n = n()
, total = sum(amt)
) %>%
mutate(stringid = "A")
# anyA othercol n total stringid
# <lgl> <lgl> <int> <dbl> <chr>
# 1 FALSE FALSE 1 200 A
# 2 TRUE FALSE 1 100 A
# 3 TRUE TRUE 1 300 A
Again the problem is using a string instead of a symbol.