Search code examples
rdplyrrbind

R bind two data frames with different time observations


I have two data frames f1 and f2

row11 <- c("a", "c", "2000", "2001", "2005")
row12 <- c("", "", 7, 10, 15)

f1 <- as.data.frame(rbind(row11, row12))

row21 <- c("a", "b", "c", "2002", "2005")
row22 <- c("", "", "", 1, 15)

f2 <- as.data.frame(rbind(row21, row22)) 

I would like to bind these two data frames in such a way, that I get the data frame f3:

row31 <- c("a", "b", "c", seq(2000,2005,1))

row32 <- c("", "", "", 7, 10, NA, NA, NA, 15)

row33 <- c("", "", "", NA, NA, 1, NA, NA, 15)

f3 <- rbind(row32, row33)
colnames(f3)  <- row31 

f3:

      a  b  c  2000 2001 2002 2003 2004 2005
row32 "" "" "" "7"  "10" NA   NA   NA   "15"
row33 "" "" "" NA   NA   "1"  NA   NA   "15"

f3 takes the column names "a", "b" and "c" and adds a time series from 2000 to 2005 with the corresponding values from f1 and f2.

I would prefer a solution with dplyr.


Solution

  • You could try the following (mostly) dplyr solution, which defines a helper function to bind the rows after renaming and removing the first row, then orders the columns.

    While I'm sure there is a more elegant way, I use a little base R to identifythe missing year columns, then back to tidyverse to add in new columns and reorder:

    # helper function
    f <- function(x){
      x %>%
        setNames(unlist(x[1,])) %>%
        slice(-1)
    }
    
    # combine all dfs
    f3_temp <- list(f1, f2) %>%
      purrr::map(f) %>%
      bind_rows()
    
    # Note, if all your data frames contain the pattern "f" followed by numbers,
    # you could replace `list(f1, f2) %>% ... with:
    f3_temp <- mget(ls(pattern = "f\\d+")) %>%
      purrr::map(f) %>%
      bind_rows()
    
    # Identify in missing years in column names
    num_names <- as.numeric(names(f3_temp))[!is.na(as.numeric(names(f3_temp)))]
    new_names <- setdiff(seq(min(num_names), max(num_names), 1L), names(f3_temp))
    
    # Add in new and reorder all columns:
    f3 <- f3_temp %>% 
      tibble::add_column(!!!new_names) %>%
      mutate(across(as.character(new_names), ~ NA)) %>%
      select(order(nchar(colnames(.)), colnames(.)))
    

    Output:

    #       a    b c 2000 2001 2002 2003 2004 2005
    # row12             7   10 <NA>   NA   NA   15
    # row22          <NA> <NA>    1   NA   NA   15