sd function returns NA when using group_by() and summarise() in dplyr (no NA values in df)

I've got a df with a binary numeric response variable (0 or 1) and several response variables. I am trying to create a table that groups by type (a 3 level variable) and step (7 levels). I want the mean response and standard deviation for each type at each step. The output table should have 21 rows with 4 variables: type, step, mean and sd.

My code looks like this:

data <- data %>% group_by(step, type) %>% summarise(Response = mean(Response), dev = sd(Response))

The output table correctly generates the mean values, but returns NA for all sd values. I tried using 'na.rm=TRUE' to remove NA values but there aren't any in the original df for response. Any ideas?

Solution

The following should work as you expect:

data <- data %>% group_by(step, type) %>% summarise(Response_mean = mean(Response), dev = sd(Response))

The reason, as mentioned, that you are getting NA, is because you are inputting a single value to sd().

However, the reason that happens is related to the order in which things happen in your code. The following part in your code:

summarise(Response = mean(Response)

is creating a variable named 'Response' in your new table, holding a single value - the mean of the vector 'Response' in your original data. The following part:

dev = sd(Response)

tries to calculate the standard deviation of that single value.

To illustrate, you can try this as well:

data <- data %>% group_by(step, type) %>% summarise(Response = mean(Response), Response_plus_10 = Response + 10)

Hope this clarifies the issue.