I'm trying to use dplyr's across and case_when across my entire dataset, so whenever it sees "Strongly Agree" it changes it to a numeric 5, "Agree" to a numeric 4, and so on. I've tried looking at this answer, but I'm getting an error because my dataset has logical and numeric columns and R rightfully says that "Agree" can't be in a logical column, etc.
Here's my data:
library(dplyr)
test <- tibble(name = c("Justin", "Corey", "Sibley"),
date = c("2021-08-09", "2021-10-29", "2021-01-01"),
s1 = c("Agree", "Neutral", "Strongly Disagree"),
s2rl = c("Agree", "Neutral", "Strongly Disagree"),
f1 = c("Strongly Agree", "Disagree", "Strongly Disagree"),
f2rl = c("Strongly Agree", "Disagree", "Strongly Disagree"),
exam = c(90, 99, 100),
early = c(TRUE, FALSE, FALSE))
Ideally, I'd like one command that would allow me to go across the entire dataset. However, if that can't be done, I'd like to have one argument that would allow me to use multiple across(contains()) arguments (i.e., here contains "s" or "f").
Here's what I've tried already to no avail:
library(dplyr)
test %>%
mutate(across(.),
~ case_when(. == "Strongly Agree" ~ 5,
. == "Agree" ~ 4,
. == "Neutral" ~ 3,
. == "Disagree" ~ 2,
. == "Strongly Disagree" ~ 1,
TRUE ~ NA))
Error: Problem with `mutate()` input `..1`.
x Must subset columns with a valid subscript vector.
x Subscript has the wrong type `tbl_df<
name: character
date: character
s1 : character
s2rl: character
f1 : character
f2rl: character
exam: double
>`.
ℹ It must be numeric or character.
ℹ Input `..1` is `across(.)`.
We can use matches
to pass regex
library(dplyr)
test %>%
mutate(across(matches('^(s|f)'), ~ case_when(. == "Strongly Agree" ~ 5,
. == "Agree" ~ 4,
. == "Neutral" ~ 3,
. == "Disagree" ~ 2,
. == "Strongly Disagree" ~ 1,
TRUE ~ NA_real_)))
-output
# A tibble: 3 x 8
name date s1 s2rl f1 f2rl exam early
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <lgl>
1 Justin 2021-08-09 4 4 5 5 90 TRUE
2 Corey 2021-10-29 3 3 2 2 99 FALSE
3 Sibley 2021-01-01 1 1 1 1 100 FALSE
According to ?across
across() makes it easy to apply the same transformation to multiple columns, allowing you to use select() semantics inside in "data-masking" functions like summarise() and mutate().
If we check the ?select
, it returns with the various select-helpers
used for selecting columns which can be used in across
as well
Tidyverse selections implement a dialect of R where operators make it easy to select variables:
: for selecting a range of consecutive variables.
! for taking the complement of a set of variables.
& and | for selecting the intersection or the union of two sets of variables.
c() for combining selections.
In addition, you can use selection helpers. Some helpers select specific columns:
everything(): Matches all variables.
last_col(): Select last variable, possibly with an offset.
These helpers select variables by matching patterns in their names:
starts_with(): Starts with a prefix.
ends_with(): Ends with a suffix.
contains(): Contains a literal string.
matches(): Matches a regular expression.
num_range(): Matches a numerical range like x01, x02, x03.
These helpers select variables from a character vector:
all_of(): Matches variable names in a character vector. All names must be present, otherwise an out-of-bounds error is thrown.
any_of(): Same as all_of(), except that no error is thrown for names that don't exist.
This helper selects variables with a function:
where(): Applies a function to all variables and selects those for which the function returns TRUE.