Disclaimer: this is a very elemental question. I'll use an example to make it easier, but the question has nothing to do with the example itself.
Supose you have a dataframe df
:
# A tibble: 5 × 4
index a b c
<int> <int> <dbl> <dbl>
1 1 0 0 1
2 2 1 0 0
3 3 0 1 0
4 4 0 1 0
5 5 1 0 0
And you want to gather the dummies into a single factor column. Getting inspiration from eatATA::dummiesToFactor()
, you could use something like:
dum2fac <- function(data) { factor(names(data)[max.col(data)]) }
df %>% mutate(name = dum2fac(across(a:c)))
# A tibble: 5 × 5
index a b c name
<int> <int> <dbl> <dbl> <fct>
1 1 0 0 1 c
2 2 1 0 0 a
3 3 0 1 0 b
4 4 0 1 0 b
5 5 1 0 0 a
Now suppose you want to modify dum2fac()
to allow for something like the following:
df %>% mutate(name = dum2fac(a:c))
I tried one specific path, and from that my "more elemental" question appeared. This was what I tried:
dum2fac <- function(expr) {
data <- select(???, {{expr}})
factor(names(data)[max.col(data)])}
Where a:c
will be passed onto expr
, and ???
should stand for "the dataset that is being used in the dplyr context". Another way of putting it: across(a:c)
doesn't refer directly to the dataset df
, it just know that it needs to access it because of the context where it is used, and I want my function to be able to do the same.
Some concepts I figured could help were the "rlang fake data pronoun" .data
, and some higher order functions/objects that are used in across
and mutate
, like the R6 object DataMask
, peek_mask()
, and others that probably aren't a good practice to use even if possible.
Obs: I'm glad to hear if you have a better path to rewrite dum2fac()
, please add it too. But again, that's not exactly what this question is about.
Dummy data:
set.seed(2023)
df <- tibble(index = 1:5,
a = sample(0:1, 5, TRUE),
b = (1 - a) * sample(0:1, 5, TRUE),
c = 1 - a - b)
You can use across()
or (more idiomatically) pick()
inside your
own function:
library(dplyr)
set.seed(2023)
df <- tibble(
index = 1:5,
a = sample(0:1, 5, TRUE),
b = (1 - a) * sample(0:1, 5, TRUE),
c = 1 - a - b
)
dum2fac <- function(expr) {
data <- pick({{ expr }})
factor(names(data)[max.col(data)])
}
df %>% mutate(name = dum2fac(a:c))
#> # A tibble: 5 × 5
#> index a b c name
#> <int> <int> <dbl> <dbl> <fct>
#> 1 1 0 0 1 c
#> 2 2 1 0 0 a
#> 3 3 0 1 0 b
#> 4 4 0 1 0 b
#> 5 5 1 0 0 a
If you want the full data without selections, use pick(everything())
.