When I use nested function in a piping step, the order of execution seems unintuitive.
df <- data.frame(a = c(1,NA,2), b = c(NA, NA, 1))
df %>% is.na %>% colSums # Produce correct count of missing values
df %>% colSums(is.na(.)) # Produce NA
Can anyone explain why the nested function in the third line does not produce the correct result?
It's because the .
always gets passed as the first argument to the following function. So in your second attempt at colSums
, you assume that you're passing is.na(.)
as the first argument to colSums
, but you're actually passing it as the second, which is the na.rm
parameter. So what your code actually looks like is df %>% colSums(x = ., na.rm = is.na(.))
. You can prevent the .
being passed as the first parameter by using {}
around the function. df %>% {colSums(is.na(.))}