Search code examples
rsummarize

Using summarize_all() to get a count of values in multiple columns


I want to create a table that counts up the number of non-NA values in each column of a dataset. I'm using summarize_all(), but I'm struggling to get a count of non-NA values.

I referred to this StackOverflow thread to get some insight, but it didn't seem to help me obtain a count: summarize_all with "n()" function. I'll either get a count of distinct values if I pass summarize_all(n_distinct) or I'll get the number of rows if I pass summarize_all(list(n=~n())) or summarize_all(list(n="length") as the thread cites.

My desired output:

ID Female  Male Non_Binary
 5      5     4          3

Where am I going wrong?

# Sample Code

test<-as_tibble(data.frame(`ID` = c("1","2","3","4","5"),
                           `Female` = c("Female","Female","Female","Female","Female"),
                           `Male` = c(NA,"Male","Male","Male","Male"),
                           `Non_Binary`=c("Non-Binary","Non-Binary","Non-Binary",NA,NA)))

## Attempt 1

summary<-test%>%
  summarize_all(list(n=~n()))

# A tibble: 1 × 4
   ID_n Female_n Male_n Non_Binary_n
  <int>    <int>  <int>        <int>
1     5        5      5            5

## Attempt 2

summary<-test%>%
  summarize_all(list(n="length"))

# A tibble: 1 × 4
   ID_n Female_n Male_n Non_Binary_n
  <int>    <int>  <int>        <int>
1     5        5      5            5

## Attempt 3

summary<-test%>%
  summarize_all(n_distinct)

# A tibble: 1 × 4
     ID Female  Male Non_Binary
  <int>  <int> <int>      <int>
1     5      1     2          2

### Desired Output

ID Female  Male Non_Binary
 5      5     4          3

Solution

  • n() and length() are poor choices for this - they don't omit NA values from what they count. The classic way to count values that meet conditions (such as "not NA") is to sum the condition.

    summarize_all() has also been deprecated for a few years. Now it is preferred to use across().

    test |>
      summarize(across(everything(), \(x) sum(!is.na(x))))
    
    # # A tibble: 1 × 4
    #      ID Female  Male Non_Binary
    #   <int>  <int> <int>      <int>
    # 1     5      5     4          3