I'm trying to perform a group_by summarise on a categorical variable, frailty score. The data is structured such that there are multiple observations for each subject, some of which contain missing data e.g.
Subject Frailty
1 Managing well
1 NA
1 NA
2 NA
2 NA
2 Vulnerable
3 NA
3 NA
3 NA
I would like the data to be summarised so that a frailty description appears if there is one available, and NA if not e.g.
Subject Frailty
1 Managing well
2 Vulnerable
3 NA
I tried the following two approaches which both returned errors:
Mode <- function(x) {
ux <- na.omit(unique(x[!is.na(x)]))
tab <- tabulate(match(x, ux)); ux[tab == max(tab)]
}
data %>%
group_by(Subject) %>%
summarise(frailty = Mode(frailty)) %>%
Error: Expecting a single value: [extent=2].
condense <- function(x){unique(x[!is.na(x)])}
data %>%
group_by(subject) %>%
summarise(frailty = condense(frailty))
Error: Column frailty must be length 1 (a summary value), not 0
If there are only one a single non-NA element, then after grouping by 'Subject', get the first non-NA element
library(dplyr)
data %>%
group_by(Subject) %>%
summarise(Frailty = Frailty[which(!is.na(Frailty))[1]])
# A tibble: 3 x 2
# Subject Frailty
# <int> <chr>
#1 1 Managing well
#2 2 Vulnerable
#3 3 <NA>
If there are more than one non-NA unique elements, either we paste
them together or return as a list
data %>%
group_by(Subject) %>%
summarise(Frailty = na_if(toString(unique(na.omit(Frailty))), ""))
# A tibble: 3 x 2
# Subject Frailty
# <int> <chr>
#1 1 Managing well
#2 2 Vulnerable
#3 3 <NA>
data <- structure(list(Subject = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L
), Frailty = c("Managing well", NA, NA, NA, NA, "Vulnerable",
NA, NA, NA)), class = "data.frame", row.names = c(NA, -9L))