I was trying to find the answer to this but couldn't. If there is an answer I apologize and will immediately delete my question.
I'm trying to merge several rows into one (this calculation should be done separately on groups, in this case variable id
can be used to group), so that no NA values are left.
# initial dataframe
df_start <- data.frame(
id = c("as", "as", "as", "as", "as", "bs", "bs", "bs", "bs", "bs"),
b = c(NA, NA, NA, NA, "A", NA, NA, 6, NA, NA),
c = c(2, NA, NA, NA, NA, 7, NA, NA, NA, NA),
d = c(NA, 4, NA, NA, NA, NA, 8, NA, NA, NA),
e = c(NA, NA, NA, 3, NA, NA, NA, NA, "B", NA),
f = c(NA, NA, 5, NA, NA, NA, NA, NA, NA, 10))
# desired output
df_end <- data.frame(id = c("as", "bs"),
b = c("A", 6),
c = c(2, 7),
d = c(4, 8),
e = c(3,"B"),
f = c(5, 10))
No need to delete the question, it may be helpful to some users. This summarises each group to the first non NA occurrence for each column.
library(dplyr)
df_start <- data.frame(
id = c("as", "as", "as", "as", "as", "bs", "bs", "bs", "bs", "bs"),
b = c(NA, NA, NA, NA, "A", NA, NA, 6, NA, NA),
c = c(2, NA, NA, NA, NA, 7, NA, NA, NA, NA),
d = c(NA, 4, NA, NA, NA, NA, 8, NA, NA, NA),
e = c(NA, NA, NA, 3, NA, NA, NA, NA, "B", NA),
f = c(NA, NA, 5, NA, NA, NA, NA, NA, NA, 10))
df_start %>%
group_by(id) %>%
summarise_all(list(~first(na.omit(.))))
Output:
# A tibble: 2 x 6
id b c d e f
<fct> <fct> <dbl> <dbl> <fct> <dbl>
1 as A 2. 4. 3 5.
2 bs 6 7. 8. B 10.
You will, of course, get some data lost if there is more than one occurrence of a value with each group for each column.