Search code examples
rpipemagrittr

avoiding/disabling lazy evaluation for pipeline processing


Imagine I have a set of functions for data processing, for example:

procA <- function(input){
  cat('\n Now processing #A') # message just to log pipeline flow 
  
  # Actual data processing, may include some diagnostic messaging:
  cat('\n #A: ', dim(input))
  input$procA <- 'procA'
  
  return(input)
}

procB <- function(input){
  cat('\n Now processing #B') # message just to log pipeline flow 
  
  # Actual data processing, may include some diagnostic messaging:
  cat('\n #B: ', dim(input))
  input$procB <- 'procB' 
  
  return(input)
}

procC <- function(input){
  cat('\n Now processing #C') # message just to log pipeline flow 
  
  # Actual data processing, may include some diagnostic messaging:
  cat('\n #C: ', dim(input))
  input$procC <- 'procC' 
  
  return(input)
}

And I combine them in a pipeline, for example:

data(iris)

iris_processed <-
  iris %>% 
  procA %>% 
  procB %>% 
  procC

Messaging output will be as following:

Now processing #C
Now processing #B
Now processing #A
#A: 150 5
#B: 150 6
#C: 150 7

Due to lazy evaluation, those log messages go in the opposite order which makes it harder for me to debug the pipeline. So far my solution is to add input <- eval(input) at the beginning of each function. Is there any better solution, any good practice standards, etc.?


Solution

  • We can use the magrittr eager pipe. Note that a library(magrittr) is needed. It is not sufficent to just use library(dplyr).

    library(magrittr)
    
    iris_processed <-
      iris %!>% 
      procA %!>% 
      procB %!>% 
      procC
    
    ## Now processing #A
    ##  #A:  150 5
    ##  Now processing #B
    ##  #B:  150 6
    ##  Now processing #C
    ##  #C:  150 7>