I've got a df with a binary numeric response variable (0 or 1) and several response variables. I am trying to create a table that groups by type (a 3 level variable) and step (7 levels). I want the mean response and standard deviation for each type at each step. The output table should have 21 rows with 4 variables: type, step, mean and sd.
My code looks like this:
data <- data %>% group_by(step, type) %>% summarise(Response = mean(Response), dev = sd(Response))
The output table correctly generates the mean values, but returns NA for all sd values. I tried using 'na.rm=TRUE' to remove NA values but there aren't any in the original df for response. Any ideas?
The following should work as you expect:
data <- data %>% group_by(step, type) %>% summarise(Response_mean = mean(Response), dev = sd(Response))
The reason, as mentioned, that you are getting NA, is because you are inputting a single value to sd().
However, the reason that happens is related to the order in which things happen in your code. The following part in your code:
summarise(Response = mean(Response)
is creating a variable named 'Response' in your new table, holding a single value - the mean of the vector 'Response' in your original data. The following part:
dev = sd(Response)
tries to calculate the standard deviation of that single value.
To illustrate, you can try this as well:
data <- data %>% group_by(step, type) %>% summarise(Response = mean(Response), Response_plus_10 = Response + 10)
Hope this clarifies the issue.