I have a data frame with 6 columns like this. You can see that the dates are duplicates. How can I merge the rows and keep the information that's in the following columns ?
date 1 2 3 4 5
2019-01-01 NA 1966439. NA NA NA
2019-01-01 NA NA NA 133.6 NA
2019-01-01 NA NA NA NA 6.2
2019-02-01 NA 1962946 NA NA NA
2019-02-01 NA NA NA 134.5 NA
2019-02-01 NA NA NA NA 6.1
2019-03-01 NA 1974072 NA NA NA
2019-03-01 NA NA NA 135.4 NA
2019-03-01 NA NA NA NA 6.3
2019-04-01 NA 1984086 NA NA NA
I want something like this, with no duplicates on the dates.
date 1 2 3 4 5
2019-01-01 NA 1966439 NA 133.6 6.2
2019-02-01 NA 1962946 NA 134.5 6.1
2019-03-01 NA 1974072 NA 135.4 6.3
2019-04-01 NA 1984086 NA NA NA
thank you very much
I pasted a solution below. Hopefully my comments explain the solution well enough.
#Packages used
library(dplyr)
#Some reproducible data
dta <- data.frame(
date = c(1, 1, 1, 2, 2, 2, 3, 3, 3),
a = c(NA, NA, NA, NA, NA, NA, NA, NA, NA),
x = c(123, NA, NA, 3456, NA, NA, 2345, NA, NA),
y = c(NA, 123, NA, NA, 3456, NA, NA, 2345, NA),
z = c(NA, NA, 123, NA, NA, 3456, NA, NA, 2345)
)
dta <- dta |>
group_by(date) |> #To group by the dates
dplyr::summarise(a = sum(a, na.rm = TRUE), #just summarise the single value (min(), mean(), etc. work just as well)
x = sum(x, na.rm = TRUE),
y = sum(y, na.rm = TRUE),
z = sum(z, na.rm = TRUE)) |>
select_if(~sum(.) > 0) #Remove columns with sum of 0 (columns with all NA)