Search code examples
rdplyrmutate

How do you count the number of observations in multiple columns and use mutate to make the counts as new columns in R?


I have a dataset that has multiple lines of survey responses from different years and from different organizations. There are 100 questions in the survey and people can skip them. I am trying to get the average for each question by year by organization (so grouped by organization and year). I also want to get the count of the number of people in those averages since people can skip them. I want these two data points as new columns as well, so it will add 200 columns total. I figured out how to the average. See code below. I can't seem to use the same function to get the count of observation.

This is how I successfully got the average.

df<- df%>%
  group_by(Organization, Year) %>%
  mutate(across(contains('Question'), mean, na.rm = TRUE, .names = "{.col}_average")) %>%
  ungroup()

I am now trying to use a similar set up to get the count of observations. I duplicated the columns with the raw data and added Count in the title so that the new average columns are not counted as columns that R needs to find the ncount for

df<- df%>%
  group_by(Organization, Year) %>%
  mutate(across(contains('Count'), function(x){sum(!is.na(.))}, .names = "{.col}_ncount")) %>%
  ungroup()

The code above does get me the new columns but the n count is the same of all columns and all rows? Any thoughts?


Solution

  • The issue is in the lambda function i.e. function(x) and then the sum is on the . instead of x. . by itself can be evaluated as the whole data

    library(dplyr)
    df%>%
      group_by(Organization, Year) %>%
      mutate(across(contains('Count'), 
         function(x){sum(!is.na(x))},
          .names = "{.col}_ncount")) %>%
      ungroup()
    

    If we want to use the . or .x, specify the lambda function as ~

    df%>%
      group_by(Organization, Year) %>%
      mutate(across(contains('Count'), 
         ~ sum(!is.na(.)),
          .names = "{.col}_ncount")) %>%
      ungroup()