Search code examples
rloggingtidyversedrop

How to show the number of dropped observations in R?


For different analyses, I use different samples, but I need to make it clear how the samples came about.

Stata shows me "XX observations dropped" after each drop command. Is there a way to get R to print the number of dropped observations by a "tidyverse styled" sample selection (see below)?

In this example I would like to see in the console how many observations were dropped by the filter and drop_na functions. I tried: summarise_all(~sum(is.na(.))) but it was unsuccessful.

capmkt_df <- stata_df %>%
  filter(change != 1 & reg_mkt == 1) %>% 
  select(any_of(capmkt_vars)) %>%
  mutate_at(vars(country, year), factor) %>%
  drop_na()

Solution

  • Since you're using tidyverse packages, a good resource is tidylog, a package that provides additional information for a lot of tidyverse (including dplyr and tidyr) functions.

    For example, using drop_na, you'll get a message drop_na: removed X rows. An illustration with the base R airquality dataset:

    library(tidyverse)
    library(tidylog, warn.conflicts = F)
    
    airquality %>% 
      drop_na()
    
    # drop_na: removed 42 rows (27%), 111 rows remaining
    #     Ozone Solar.R Wind Temp Month Day
    # 1      41     190  7.4   67     5   1
    # 2      36     118  8.0   72     5   2
    # 3      12     149 12.6   74     5   3
    # 4      18     313 11.5   62     5   4
    # 5      23     299  8.6   65     5   7
    # 6      19      99 13.8   59     5   8
    # 7       8      19 20.1   61     5   9
    # 8      16     256  9.7   69     5  12
    # 9      11     290  9.2   66     5  13
    # 10     14     274 10.9   68     5  14
    # ...