Search code examples
rdataframedateaggregate

Bind or merge rows by duplicate dates in R


I have a data frame with 6 columns like this. You can see that the dates are duplicates. How can I merge the rows and keep the information that's in the following columns ?

date     1       2          3    4    5

2019-01-01  NA  1966439.    NA  NA    NA
2019-01-01  NA  NA          NA  133.6 NA
2019-01-01  NA  NA          NA  NA    6.2
2019-02-01  NA  1962946     NA  NA    NA
2019-02-01  NA  NA          NA  134.5 NA
2019-02-01  NA  NA          NA  NA    6.1
2019-03-01  NA  1974072     NA  NA    NA
2019-03-01  NA  NA          NA  135.4 NA
2019-03-01  NA  NA          NA  NA    6.3
2019-04-01  NA  1984086     NA  NA    NA

I want something like this, with no duplicates on the dates.

date        1     2      3    4   5 

2019-01-01  NA  1966439 NA  133.6 6.2
2019-02-01  NA  1962946 NA  134.5 6.1
2019-03-01  NA  1974072 NA  135.4 6.3
2019-04-01  NA  1984086 NA  NA    NA

thank you very much


Solution

  • I pasted a solution below. Hopefully my comments explain the solution well enough.

    #Packages used
    library(dplyr)
    
    #Some reproducible data
    dta <- data.frame(
      date = c(1, 1, 1, 2, 2, 2, 3, 3, 3),
      a = c(NA, NA, NA, NA, NA, NA, NA, NA, NA),
      x = c(123, NA, NA, 3456, NA, NA, 2345, NA, NA),
      y = c(NA, 123, NA, NA, 3456, NA, NA, 2345, NA),
      z = c(NA, NA, 123, NA, NA, 3456, NA, NA, 2345)
    )
    
    dta <- dta |> 
      group_by(date) |> #To group by the dates
      dplyr::summarise(a = sum(a, na.rm = TRUE), #just summarise the single value (min(), mean(), etc. work just as well)
                x = sum(x, na.rm = TRUE),
                y = sum(y, na.rm = TRUE),
                z = sum(z, na.rm = TRUE)) |> 
      select_if(~sum(.) > 0) #Remove columns with sum of 0 (columns with all NA)