I am trying to build a helper function that extract the digits in the column given in argument. I'm able to use my function inside mutate
(and repeat it for all columns of interest), but it doesn't seems to work inside mutate_at
.
Here is an example of what my data looks like :
> set.seed(20190928)
> evalYr <- 2018
> n <- 5
> (df <- data.frame(
+ AY = sample(2016:2019, n, replace = T),
+ Pay00 = rgamma(n, 2, 1/1000),
+ Pay01 = rgamma(n, 2, 1/1000),
+ Pay02 = rgamma(n, 2, 1/1000),
+ Pay03 = rgamma(n, 2, 1/1000)
+ ))
AY Pay00 Pay01 Pay02 Pay03
1 2018 2520.3772 2338.9490 919.8245 629.1657
2 2016 259.7804 1543.4450 661.6488 2382.7916
3 2018 2446.3075 312.5143 2297.9717 942.5627
4 2017 1386.6288 4179.0352 2370.2669 1846.5838
5 2018 541.8261 2104.4589 2622.1758 2606.0694
So I've build (using dplyr
syntax) this helper to mutate on every PayXX
column I have :
# Helper function to get the number inside column `PayXX` name
f1 <- function(pmt) enquo(pmt) %>% quo_name() %>% str_extract('(\\d)+') %>% as.numeric()
This function is working fine with dplyr::mutate
:
> df %>% mutate(Pay00_numcol = f1(Pay00),
+ Pay01_numcol = f1(Pay01),
+ Pay02_numcol = f1(Pay02),
+ Pay03_numcol = f1(Pay03))
AY Pay00 Pay01 Pay02 Pay03 Pay00_numcol Pay01_numcol Pay02_numcol Pay03_numcol
1 2018 2520.3772 2338.9490 919.8245 629.1657 0 1 2 3
2 2016 259.7804 1543.4450 661.6488 2382.7916 0 1 2 3
3 2018 2446.3075 312.5143 2297.9717 942.5627 0 1 2 3
4 2017 1386.6288 4179.0352 2370.2669 1846.5838 0 1 2 3
5 2018 541.8261 2104.4589 2622.1758 2606.0694 0 1 2 3
But when I try to use the same function inside mutate_at
, it returns NA's :
> df %>% mutate_at(vars(starts_with('Pay')), list(numcol = ~f1(.)))
AY Pay00 Pay01 Pay02 Pay03 Pay00_numcol Pay01_numcol Pay02_numcol Pay03_numcol
1 2018 2520.3772 2338.9490 919.8245 629.1657 NA NA NA NA
2 2016 259.7804 1543.4450 661.6488 2382.7916 NA NA NA NA
3 2018 2446.3075 312.5143 2297.9717 942.5627 NA NA NA NA
4 2017 1386.6288 4179.0352 2370.2669 1846.5838 NA NA NA NA
5 2018 541.8261 2104.4589 2622.1758 2606.0694 NA NA NA NA
Anyone ever had a similar problem? How do I deal with the mutate_at
function in this case?
Thanks,
library(tidyverse)
library(stringr)
set.seed(20190928)
evalYr <- 2018
n <- 5
(df <- data.frame(
AY = sample(2016:2019, n, replace = T),
Pay00 = rgamma(n, 2, 1/1000),
Pay01 = rgamma(n, 2, 1/1000),
Pay02 = rgamma(n, 2, 1/1000),
Pay03 = rgamma(n, 2, 1/1000)
))
# Helper function to get the number inside column `PayXX` name
f1 <- function(pmt) enquo(pmt) %>% quo_name() %>% str_extract('(\\d)+') %>% as.numeric()
# Working
df %>% mutate(Pay00_numcol = f1(Pay00),
Pay01_numcol = f1(Pay01),
Pay02_numcol = f1(Pay02),
Pay03_numcol = f1(Pay03))
# Not working
df %>% mutate_at(vars(starts_with('Pay')), list(numcol = ~f1(.)))
The first way I thought of is that this might be easier with reshaping the data. However, it still takes a tangle of tidyr
functions to get 1) a column of "Pay00", "Pay01", etc; 2) extract the numbers; 3) manipulate so you can use tidyr::spread
to get back to wide-shaped; and 4) spread and remove the "_value" bit I tacked on.
I believe there's a nicer way to do this with the recent version of tidyr
, since the new pivot_wider
function should be able to take more than one column as value
. I haven't messed with this at all, but maybe someone else can write that up.
library(tidyverse)
df %>%
rowid_to_column() %>%
gather(key, value, -AY, -rowid) %>%
mutate(numcol = as.numeric(str_extract(key, "\\d+$"))) %>%
gather(key = coltype, value, value, numcol) %>%
unite(key, key, coltype) %>%
spread(key, value) %>%
select(AY, ends_with("value"), ends_with("numcol")) %>%
rename_all(str_remove, "_value")
#> AY Pay00 Pay01 Pay02 Pay03 Pay00_numcol Pay01_numcol
#> 1 2018 2520.3772 2338.9490 919.8245 629.1657 0 1
#> 2 2016 259.7804 1543.4450 661.6488 2382.7916 0 1
#> 3 2018 2446.3075 312.5143 2297.9717 942.5627 0 1
#> 4 2017 1386.6288 4179.0352 2370.2669 1846.5838 0 1
#> 5 2018 541.8261 2104.4589 2622.1758 2606.0694 0 1
#> Pay02_numcol Pay03_numcol
#> 1 2 3
#> 2 2 3
#> 3 2 3
#> 4 2 3
#> 5 2 3
Or, if you want to stick with the tidyeval approach: get the names of the columns-as-quosures you're calling your function on. Just be careful that if you use list(numcol = ~f1(.))
notation, all of those quosures will just come up as .
f1 <- function(pmt) {
str_extract(rlang::as_name(enquo(pmt)), "\\d+$") %>%
as.numeric()
}
df %>%
mutate_at(vars(starts_with("Pay")), list(numcol = f1))
# same output as prev