Search code examples
rdplyrquosure

Using dplyr quosure custom function with mutate_at


I am trying to build a helper function that extract the digits in the column given in argument. I'm able to use my function inside mutate (and repeat it for all columns of interest), but it doesn't seems to work inside mutate_at.

Here is an example of what my data looks like :

> set.seed(20190928)
> evalYr <- 2018
> n <- 5
> (df <- data.frame(
+     AY = sample(2016:2019, n, replace = T),
+     Pay00 = rgamma(n, 2, 1/1000),
+     Pay01 = rgamma(n, 2, 1/1000),
+     Pay02 = rgamma(n, 2, 1/1000),
+     Pay03 = rgamma(n, 2, 1/1000)
+ ))
    AY     Pay00     Pay01     Pay02     Pay03
1 2018 2520.3772 2338.9490  919.8245  629.1657
2 2016  259.7804 1543.4450  661.6488 2382.7916
3 2018 2446.3075  312.5143 2297.9717  942.5627
4 2017 1386.6288 4179.0352 2370.2669 1846.5838
5 2018  541.8261 2104.4589 2622.1758 2606.0694

So I've build (using dplyr syntax) this helper to mutate on every PayXX column I have :

# Helper function to get the number inside column `PayXX` name
f1 <- function(pmt) enquo(pmt) %>% quo_name() %>% str_extract('(\\d)+') %>% as.numeric()

This function is working fine with dplyr::mutate :

> df %>% mutate(Pay00_numcol = f1(Pay00),
+               Pay01_numcol = f1(Pay01),
+               Pay02_numcol = f1(Pay02),
+               Pay03_numcol = f1(Pay03))
    AY     Pay00     Pay01     Pay02     Pay03 Pay00_numcol Pay01_numcol Pay02_numcol Pay03_numcol
1 2018 2520.3772 2338.9490  919.8245  629.1657            0            1            2            3
2 2016  259.7804 1543.4450  661.6488 2382.7916            0            1            2            3
3 2018 2446.3075  312.5143 2297.9717  942.5627            0            1            2            3
4 2017 1386.6288 4179.0352 2370.2669 1846.5838            0            1            2            3
5 2018  541.8261 2104.4589 2622.1758 2606.0694            0            1            2            3

But when I try to use the same function inside mutate_at, it returns NA's :

> df %>% mutate_at(vars(starts_with('Pay')), list(numcol = ~f1(.)))
    AY     Pay00     Pay01     Pay02     Pay03 Pay00_numcol Pay01_numcol Pay02_numcol Pay03_numcol
1 2018 2520.3772 2338.9490  919.8245  629.1657           NA           NA           NA           NA
2 2016  259.7804 1543.4450  661.6488 2382.7916           NA           NA           NA           NA
3 2018 2446.3075  312.5143 2297.9717  942.5627           NA           NA           NA           NA
4 2017 1386.6288 4179.0352 2370.2669 1846.5838           NA           NA           NA           NA
5 2018  541.8261 2104.4589 2622.1758 2606.0694           NA           NA           NA           NA

Anyone ever had a similar problem? How do I deal with the mutate_at function in this case?

Thanks,

Reproductible example

library(tidyverse)
library(stringr)
set.seed(20190928)
evalYr <- 2018
n <- 5
(df <- data.frame(
    AY = sample(2016:2019, n, replace = T),
    Pay00 = rgamma(n, 2, 1/1000),
    Pay01 = rgamma(n, 2, 1/1000),
    Pay02 = rgamma(n, 2, 1/1000),
    Pay03 = rgamma(n, 2, 1/1000)
))

# Helper function to get the number inside column `PayXX` name
f1 <- function(pmt) enquo(pmt) %>% quo_name() %>% str_extract('(\\d)+') %>% as.numeric()

# Working
df %>% mutate(Pay00_numcol = f1(Pay00),
              Pay01_numcol = f1(Pay01),
              Pay02_numcol = f1(Pay02),
              Pay03_numcol = f1(Pay03))

# Not working
df %>% mutate_at(vars(starts_with('Pay')), list(numcol = ~f1(.)))

Solution

  • The first way I thought of is that this might be easier with reshaping the data. However, it still takes a tangle of tidyr functions to get 1) a column of "Pay00", "Pay01", etc; 2) extract the numbers; 3) manipulate so you can use tidyr::spread to get back to wide-shaped; and 4) spread and remove the "_value" bit I tacked on.

    I believe there's a nicer way to do this with the recent version of tidyr, since the new pivot_wider function should be able to take more than one column as value. I haven't messed with this at all, but maybe someone else can write that up.

    library(tidyverse)
    
    df %>%
      rowid_to_column() %>%
      gather(key, value, -AY, -rowid) %>%
      mutate(numcol = as.numeric(str_extract(key, "\\d+$"))) %>%
      gather(key = coltype, value, value, numcol) %>%
      unite(key, key, coltype) %>%
      spread(key, value) %>%
      select(AY, ends_with("value"), ends_with("numcol")) %>%
      rename_all(str_remove, "_value")
    #>     AY     Pay00     Pay01     Pay02     Pay03 Pay00_numcol Pay01_numcol
    #> 1 2018 2520.3772 2338.9490  919.8245  629.1657            0            1
    #> 2 2016  259.7804 1543.4450  661.6488 2382.7916            0            1
    #> 3 2018 2446.3075  312.5143 2297.9717  942.5627            0            1
    #> 4 2017 1386.6288 4179.0352 2370.2669 1846.5838            0            1
    #> 5 2018  541.8261 2104.4589 2622.1758 2606.0694            0            1
    #>   Pay02_numcol Pay03_numcol
    #> 1            2            3
    #> 2            2            3
    #> 3            2            3
    #> 4            2            3
    #> 5            2            3
    

    Or, if you want to stick with the tidyeval approach: get the names of the columns-as-quosures you're calling your function on. Just be careful that if you use list(numcol = ~f1(.)) notation, all of those quosures will just come up as .

    f1 <- function(pmt) {
      str_extract(rlang::as_name(enquo(pmt)), "\\d+$") %>%
        as.numeric()
    }
    
    df %>%
      mutate_at(vars(starts_with("Pay")), list(numcol = f1))
    # same output as prev