Search code examples
rdplyrmagrittr

Order of execution of nested functions in dplyr pipe


When I use nested function in a piping step, the order of execution seems unintuitive.

df <- data.frame(a = c(1,NA,2), b = c(NA, NA, 1))
df %>% is.na %>% colSums # Produce correct count of missing values
df %>% colSums(is.na(.)) # Produce NA

Can anyone explain why the nested function in the third line does not produce the correct result?


Solution

  • It's because the . always gets passed as the first argument to the following function. So in your second attempt at colSums, you assume that you're passing is.na(.) as the first argument to colSums, but you're actually passing it as the second, which is the na.rm parameter. So what your code actually looks like is df %>% colSums(x = ., na.rm = is.na(.)). You can prevent the . being passed as the first parameter by using {} around the function. df %>% {colSums(is.na(.))}