Aim is to concatenate rows toString (which contain NAs) based on unique identifiers
id year cat_1 cat_2
001 2021 Too high NA
001 2021 YOY error YOY error
002 2021 Too high Too low
002 2021 NA YOY error
003 2021 Too high NA
003 2021 YOY error NA
Looking for a more efficient solution than:
df <- df %>% group_by(id, year) %>% summarise(across(everything(), toString, na.rm = TRUE))
Which results in NA being concatenated into the string
id year cat_1 cat_2
001 2021 Too high, YOY error NA, YOY error
002 2021 Too high, NA Too low, YOY error
003 2021 Too high, YOY error NA, NA
And then replacing the string NAs with blanks and blanks with NA:
df[] <- lapply(df, gsub, pattern = "NA, ", replacement = "", fixed = TRUE)
df[] <- lapply(df, gsub, pattern = ", NA", replacement = "", fixed = TRUE)
df[] <- lapply(df, gsub, pattern = "NA", replacement = "", fixed = TRUE)
df[df==""] <- NA
I assume I am misuing na.rm when using summarise? Alternatively is there a different method?
df %>%
group_by(id, year) %>%
summarise(across(everything(), ~toString(na.omit(.x))))
# A tibble: 3 x 4
# Groups: id [3]
id year cat_1 cat_2
<int> <int> <chr> <chr>
1 1 2021 Too high, YOY error "YOY error"
2 2 2021 Too high "Too low, YOY error"
3 3 2021 Too high, YOY error ""
Base R:
aggregate(.~id + year, df, \(x)toString(na.omit(x)), na.action = identity)
id year cat_1 cat_2
1 1 2021 Too high, YOY error YOY error
2 2 2021 Too high Too low, YOY error
3 3 2021 Too high, YOY error