Search code examples
rdplyrlambdapurrracross

Perform multiple operations on multiple columns with dplyr


I am working in R and have a dataframe with various series. I need to perform on most of these columns the following two operations:

  1. max(0,x_t - max(x_{t-1}, x_{t-2}, x_{t-3}, x_{t-4}))
  2. max(0,-1 + x_t / max(x_{t-1}, x_{t-2}, x_{t-3}, x_{t-4}))

I tried this solution for one column

df %>% mutate( pmax(0,x - pmax(lag(x), lag(x,2), lag(x,3), lag(x,4))) )

but I guess it’s possible to do this for all columns and for both operations using dplyr’s across and purrr syntax. Any ideas on how to do this?


Solution

  • You could use the across() function in the dplyr package.

    #Define some test data
    df <- data.frame( x= round(runif(100, 10, 15)), y=round(rnorm(100, 10, 5), 1))
    
    #define the function to apply on each column
    mypmax <- function(i){
        pmax(0,i - pmax(lag(i), lag(i,2), lag(i,3), lag(i,4)))
    }
    
    #apply the function on columns 1 & 2.
    #create new column names to store the results.
    df %>% mutate(across(c(1,2), mypmax, .names = "new_{.col}" ) )
    
             x    y new_x new_y
    1   12  7.3    NA    NA
    2   14 10.9    NA    NA
    3   10 17.8    NA    NA
    4   14 12.5    NA    NA
    5   15 10.0     1   0.0
    6   14 11.6     0   0.0
    7   10  7.9     0   0.0
    8   12  8.6     0   0.0
    9   11 11.3     0   0.0
    10  11  4.7     0   0.0