I am trying to calculate slopes for multiple measures in a grouped dataset using lm() in R. However, some groups have all NA values for certain measures, which causes the following error:
Error in `map()`:
i In index: 2.
Caused by error in `summarize()`:
i In argument: `Slope = (lm(reformulate("nDay", measure)))$coefficients[2]`.
i In group 5: `Subject = 3`, `Response = x`.
Caused by error in `lm.fit()`:
! 0 (non-NA) cases
Run `rlang::last_trace()` to see where the error occurred.
I understand that this error occurs because some groups have all NA values, making it impossible to fit a linear model. However, I haven't been able to figure out how to handle these cases by returning NA for the slope instead of crashing.
Here is a minimal working example for my code:
library(dplyr)
library(tidyr)
library(purrr)
calculate_slope = function(df, measure) {
df %>%
summarize(Measure = measure,
Slope = (lm(reformulate("nDay", measure)))$coefficients[2],
.groups = "drop")
}
example_data = expand.grid(
Subject = 1:3,
Response = c("x", "y"),
nDay = 1:3
) %>%
mutate(
A = runif(n(), 0, 1),
B = runif(n(), 0, 1),
C = runif(n(), 0, 1)
)
# Set some values to NA
example_data = example_data %>%
mutate(B = ifelse(Subject == 3, NA, B))
measures = c("A", "B", "C")
summary_data = map_dfr(measures, ~ example_data %>%
group_by(Subject, Response) %>%
calculate_slope(., .x)) %>%
pivot_wider(names_from = Measure, values_from = Slope) %>%
rename_with(~ paste0("slope_", .), -c(Subject, Response))
I have tried modifying the calculate_slope
function to check for all NA
values, but I cannot make it work properly because the grouping isn't preserved and then I get the same value across the full measure. The goal is to calculate the slopes for each measure (A, B, C) grouped by Subject
and Response
, and return NA
for the slope if a group has all NA
values and a model can't be fit.
You can use the condition handling function tryCatch()
, and set error = function(e) NA
to return NA
when lm
crashes.
calculate_slope = function(df, measure) {
df %>%
summarize(Measure = measure,
Slope = tryCatch(lm(reformulate("nDay", measure))$coefficients[2],
error = function(e) NA),
.groups = "drop")
}
I think the above modification should enable your example code to execute successfully. Below, I provide an alternative approach that achieves the exact same results as your code but uses more concise syntax (only depending on dplyr
).
measures <- c("A", "B", "C")
example_data %>%
group_by(Subject, Response) %>%
summarise(across(all_of(measures),
~ tryCatch(lm(.x ~ nDay)$coefficients[2], error = \(e) NA),
.names = "slope_{.col}"),
.groups = "drop")
# # A tibble: 6 × 5
# Subject Response slope_A slope_B slope_C
# <int> <fct> <dbl> <dbl> <dbl>
# 1 1 x 0.195 0.318 -0.246
# 2 1 y 0.00840 0.0513 0.105
# 3 2 x -0.108 -0.0261 0.321
# 4 2 y -0.347 -0.308 0.328
# 5 3 x -0.153 NA -0.136
# 6 3 y -0.00175 NA -0.146