Merge dataframes without duplicate rows with NA values

I have data from multiple files that have data hourly observations for different variables. I'm trying to combine them to a single dataframe where I have every observation for each variable for the whole time interval. Some variables have data in multiple dataframes for different time intervals. Each dataframe has a time column that can be used for joining.

The problem is that full_join creates more rows than my data has hours (df1). Instead I would like to get a dataframe (df2) without NA values and extra rows. One solution is to join the dataframes in specific order but I'm hoping for a more general solution that works with larger scale for combining datasets for this type of data.

library(tidyverse)
a <- data.frame(hour = 1:10, x = 5)
b <- data.frame(hour = 1:15, y = 15)
c <- data.frame(hour = 11:20, x = 10)

list <- list(a,b,c)

df1 <- list %>%
  reduce(full_join)

df1
   hour  x  y
1     1  5 15
2     2  5 15
3     3  5 15
4     4  5 15
5     5  5 15
6     6  5 15
7     7  5 15
8     8  5 15
9     9  5 15
10   10  5 15
11   11 NA 15
12   12 NA 15
13   13 NA 15
14   14 NA 15
15   15 NA 15
16   11 10 NA
17   12 10 NA
18   13 10 NA
19   14 10 NA
20   15 10 NA
21   16 10 NA
22   17 10 NA
23   18 10 NA
24   19 10 NA
25   20 10 NA

df2 <- full_join(a,c) %>%
  full_join(b)

df2
   hour  x  y
1     1  5 15
2     2  5 15
3     3  5 15
4     4  5 15
5     5  5 15
6     6  5 15
7     7  5 15
8     8  5 15
9     9  5 15
10   10  5 15
11   11 10 15
12   12 10 15
13   13 10 15
14   14 10 15
15   15 10 15
16   16 10 NA
17   17 10 NA
18   18 10 NA
19   19 10 NA
20   20 10 NA

EDIT: I got the correct result with tidyverse with following code:

df <- list %>%
  reduce(full_join) %>%
  group_by(hour) %>%
  summarise_all(mean, na.rm = T)

With data.table meltand dcast also seem to work

dt1 <- as.data.table(df1)
m <- as.data.table(melt(dt1, id.vars = "hour"))
dc <- dcast.data.table(m, hour~variable, fun.aggregate = mean, na.rm = T)
df <- as.data.frame(dc)

I guess mean can be replaced with some other function but I couldn't figure which.

Solution

Here is a version where the primary key can be a composite (for example hour and group). It doesn't work with aggregating like your own solution you added to the question.

## Identify (composite) primary key 
mykeys <- c("hour")

## Using the data.table package below
list %>%
    lapply(setDT)

## Create a dataset with all unique combinations of the keys to join onto
full <- lapply(list, function(dt) dt[,..mykeys]) %>%
    rbindlist %>%
    unique

## Can make the keys as actual keys on the full dataset but not necessary
# setkeyv(full,mykeys)

## Join all variables onto the full dataset.
## The preserved values of duplicates are from the last dataset in the order of 
## the list object. This step is done by reference.
list %>%
    lapply(function(dt) {
        n <- names(dt)
        full[dt,
             on = mykeys,
             (n) := mget(paste0("i.", n))]
    })
full

Here is an example dataset with a composite primary key and more columns to help identify what is going on. Again, note that the order of the datasets in list matter.

mykeys <- c("hour"
            ,"group"
)

a <- data.frame(hour = 1:10, 
                group = 1,
                x = 5, z = 1, a = 1)
b <- data.frame(hour = 1:15, 
                group = 1,
                y = 15, z = 2, b = 1)
c <- data.frame(hour = 11:20, 
                group = 1,
                # group = 2,
                x = 10, z = 3, c = 1)