In my R script, I would like to print a message in the middle of a long pipe (which takes a long time to run) without breaking the pipe. I need this to track the progress of the running pipeline.
I tried to create a simple function that prints a message without breaking the pipe:
CatPipe <- function(x, message = NULL) {
cat(message, sep = "\n")
return(x)
}
Here is an example implementation:
require(dplyr)
mtcars %>%
CatPipe("Hello1") %>%
{
Sys.sleep(1)
.
} %>%
CatPipe("Hello2") %>%
{
Sys.sleep(1)
.
} %>%
head(5) %>%
CatPipe("Hello3")
However, the printed messages were not as expected. The messages are printed in reverse order, irrespective if the current step was finished or not!
Hello3
Hello2
Hello1
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Is there an explanation for this? Should I make any changes to the function? Is there an alternative for this using dplyr
, magrittr
, or similar packages?
The %>%
pipe uses lazy evaluation, which means that it doesn't evaluate expressions until it needs to. In practice, this means that it reaches the end of the pipe and then works backward ... the magrittr
package defines an eager pipe operator %!>%
which has the behaviour you want (%>%
actually comes from magrittr
as well; it's imported and re-exported by dplyr
).
In order to print the mtcars
output before Hello3
, you can use a tee pipe (%T>%
), also from magrittr
, plus a print()
statement. The reason the output usually prints at the end is that R only prints the final result of an expression once it has been completely evaluated. (I also added a -> result
to the end of the pipe to store the result and prevent it from being printed again at the end; I added a SleepPipe()
function similar to your CatPipe()
to make the code a little bit prettier.)
library(magrittr)
SleepPipe <- function(x, time = 1) {
Sys.sleep(time)
return(x)
}
mtcars %!>%
CatPipe("Hello1") %!>%
SleepPipe(1) %!>%
CatPipe("Hello2") %!>%
SleepPipe(1) %!>%
head(5) %T>% print() %!>%
CatPipe("Hello3") -> result
|>
also uses lazy evaluation (to test it I had to use the SleepPipe()
function, since the native pipe doesn't allow arbitrary {.}
expressions)magrittr
re-implemented the pipe to use lazy evaluation in version 2.0 ....