Search code examples
rfiltertidyverse

How to count observations after each filter in tidyverse R?


I often apply filters in R, e.g. when selecting a particular sample or feature or excluding implausible values. Is there a fast and easy way to keep track on how many units were deleted after every filter? I like to save the number of observations in a .csv or .txt file. I think Stata is reporting observations used after every step in a log file. What can I do in R?

data(mtcars)

library(tidyverse)

sample <- mtcars %>% # 32 obs
  filter(mpg > 20) %>%  # 14 obs
  filter(cyl == 4) %>%  # 11 obs
  filter(am == 0) # 3 obs

Solution

  • You can use tidylog (a package built around the tidyverse) to add information in the console when you perform filter(s) and other tidyverse functions:

    library(tidylog)
    sample <- mtcars %>% # 32 obs
      filter(mpg > 20) %>%  # 14 obs
      filter(cyl == 4) %>%  # 11 obs
      filter(am == 0) # 3 obs
    
    #filter: removed 18 rows (56%), 14 rows remaining
    #filter: removed 3 rows (21%), 11 rows remaining
    #filter: removed 8 rows (73%), 3 rows remaining