Imagine I have a set of functions for data processing, for example:
procA <- function(input){
cat('\n Now processing #A') # message just to log pipeline flow
# Actual data processing, may include some diagnostic messaging:
cat('\n #A: ', dim(input))
input$procA <- 'procA'
return(input)
}
procB <- function(input){
cat('\n Now processing #B') # message just to log pipeline flow
# Actual data processing, may include some diagnostic messaging:
cat('\n #B: ', dim(input))
input$procB <- 'procB'
return(input)
}
procC <- function(input){
cat('\n Now processing #C') # message just to log pipeline flow
# Actual data processing, may include some diagnostic messaging:
cat('\n #C: ', dim(input))
input$procC <- 'procC'
return(input)
}
And I combine them in a pipeline, for example:
data(iris)
iris_processed <-
iris %>%
procA %>%
procB %>%
procC
Messaging output will be as following:
Now processing #C
Now processing #B
Now processing #A
#A: 150 5
#B: 150 6
#C: 150 7
Due to lazy evaluation, those log messages go in the opposite order which makes it harder for me to debug the pipeline. So far my solution is to add input <- eval(input)
at the beginning of each function. Is there any better solution, any good practice standards, etc.?
We can use the magrittr eager pipe. Note that a library(magrittr)
is needed. It is not sufficent to just use library(dplyr)
.
library(magrittr)
iris_processed <-
iris %!>%
procA %!>%
procB %!>%
procC
## Now processing #A
## #A: 150 5
## Now processing #B
## #B: 150 6
## Now processing #C
## #C: 150 7>