Search code examples
rconcatenationxlsx

Concatenate .xlsx files with successive column names in R


Let's say I have data frames like this:

df1 <- as.data.frame(rbind(c(182, 142, 120), c(640, 381, 192)))
colnames(df1) <- c("Q1", "Q2", "Q3")

df2 <- as.data.frame(rbind(c(73, 51, 67, 38), c(62, 42, 33, 46), c(84, 91, 77, 56)))
colnames(df2) <- c("Q4", "Q5", "Q6", "Q7")

df3 <- as.data.frame(t(c(1820, 2114)))
colnames(df3) <- c("Q8", "Q9")

df1
  Q1  Q2  Q3
1 182 142 120
2 640 381 192

df2
  Q4 Q5 Q6 Q7
1 73 51 67 38
2 62 42 33 46
3 84 91 77 56

df3
  Q8   Q9
1 1820 2114

Note that I have 3 data frames with different lengths. What I'd like to achieve is a concatenation of one above the other one, indicating its respective column names at doing so.

Desired output:

  1 Q1   Q2   Q3
  2 182  142  120
  3 640  381  192
  4 Q4   Q5   Q6   Q7
  5 73   51   67   38
  6 62   42   33   46
  7 84   91   77   56
  8 Q8   Q9
  9 1820 2114

I'd been trying to do this first by exporting files to .xlsx then concatenating them, but I only manage to sort of cbind them

Thanks in advance


Solution

  • Not sure whether your requested format makes much sense in R, since

    1. all elements of a column of a data.frame have to be of the same class, hence will be coerced to character once you add the header lines as rows

    2. all rows of a data.frame have to be of the same length, hence you would need to patch the data.frames with less columns with NA columns

    But below would be one solution that would give you a data.frame containing the data in the way you showed in your desired output:

    df1 <- as.data.frame(rbind(c(182, 142, 120), c(640, 381, 192)))
    colnames(df1) <- c("Q1", "Q2", "Q3")
    
    df2 <- as.data.frame(rbind(c(73, 51, 67, 38), c(62, 42, 33, 46), c(84, 91, 77, 56)))
    colnames(df2) <- c("Q4", "Q5", "Q6", "Q7")
    
    df3 <- as.data.frame(t(c(1820, 2114)))
    colnames(df3) <- c("Q8", "Q9")
    
    maxCol <- max(ncol(df1), ncol(df2), ncol(df3))
    
    match_df <- function(x, ncol){
        delta <- ncol - dim(x)[2]
        out <- {
            if(delta > 0) unname(data.frame(append((rbind(colnames(x), unname(x))), 
                c(rep(NA, delta))))) else 
            unname(data.frame((rbind(colnames(x), unname(x)))))[, seq_len(ncol)]}
        colnames(out) <- paste0("V", seq_len(ncol))
        out
    }
    out <- unname(do.call(rbind, lapply(list(df1, df2, df3), match_df, ncol=maxCol)))
    print(out, na.print = "")
    #>                   
    #> 1   Q1   Q2  Q3   
    #> 2  182  142 120   
    #> 3  640  381 192   
    #> 4   Q4   Q5  Q6 Q7
    #> 5   73   51  67 38
    #> 6   62   42  33 46
    #> 7   84   91  77 56
    #> 8   Q8   Q9       
    #> 9 1820 2114
    

    A more natural way in R that maintains the numeric data as numeric would be to "sort of cbind them", as you put it, and, if you wish, melt and filter them to remove NA values:

    library(data.table)
    melt(rbindlist(list(df1, df2, df3), fill=TRUE), na.rm=TRUE)
    #>     variable value
    #>  1:       Q1   182
    #>  2:       Q1   640
    #>  3:       Q2   142
    #>  4:       Q2   381
    #>  5:       Q3   120
    #>  6:       Q3   192
    #>  7:       Q4    73
    #>  8:       Q4    62
    #>  9:       Q4    84
    #> 10:       Q5    51
    #> 11:       Q5    42
    #> 12:       Q5    91
    #> 13:       Q6    67
    #> 14:       Q6    33
    #> 15:       Q6    77
    #> 16:       Q7    38
    #> 17:       Q7    46
    #> 18:       Q7    56
    #> 19:       Q8  1820
    #> 20:       Q9  2114