Search code examples
rtidyversemagrittr

Print data frame dimensions at each step of filtering


I am using the tidyverse to filter out a dataframe and would like a print at each step of the dimensions (or nrows) of the intermediate objects. I thought I could simply use a tee pipe operator from magrittr but it doesn't work. I think I understand the concept behind the tee pipe but can't figure out what is wrong. I searched extensively but didn't find much resources about the tee pipe.

I built a simple example with the mtcars dataset. Printing the intermediate objects works but not if I replace with dim() or nrow().

library(tidyverse)
library(magrittr)

mtcars %>% 
    filter(cyl > 4) %T>% dim() %>%
    filter(am == 0) %T>% dim() %>%
    filter(disp >= 200) %>% dim()

I can of course write that in R base but would like to stick to the tidyverse spirit. I probably underlooked something about tee pipe concept and any comments/solutions will be greatly appreciated.

EDIT: Following @hrbrmstr and @akrun nice and quick answers, I tried again to stick to tee pipe operator without writing a function. I don't know why I didn't find out the answer earlier myself but here is the syntax I was looking for:

mtcars %>% filter(cyl > 4) %T>% {print(dim(.))} %>% filter(am == 0) %T>% {print(dim(.))} %>% filter(disp >= 200) %>% {print(dim(.))}

Despite the need of a function, @hrbrmstr solution is indeed easier to "clean up".


Solution

  • @akrun's idea works, but it's not idiomatic tidyverse. Other functions in the tidyverse, like print() and glimpse() return the data parameter invisibly so they can be piped without resorting to {}. Those {} make it difficult to clean up pipes after your done exploring what's going on.

    Try:

    library(tidyverse)
    
    tidydim <- function(x) {
      print(dim(x))
      invisible(x)
    }
    
    mtcars %>%
      filter(cyl > 4) %>%
      tidydim() %>% 
      filter(., am == 0) %>%
      tidydim() %>% 
      filter(., disp >= 200) %>%
      tidydim()
    

    That way your "cleanup" (i.e. not producing interim console output) canbe to quickly/easily remove the tidydim() lines or remove the print(…) from the function.