Search code examples
rmemory-managementbigdataff

How to column bind two ffdf


Suppose two ffdf files:

library(ff)
ff1 <- as.ffdf(data.frame(matrix(rnorm(10*10),ncol=10)))
ff2 <- ff1
colnames(ff2) <- 1:10

How can I column bind these without loading them into memory? cbind doesn't work.

There is the same question http://stackoverflow.com/questions/18355686/columnbind-ff-data-frames-in-r but it does not have an MWE and the author abandoned it so I reposted.


Solution

  • You can use the following construct cbind.ffdf2, making sure the column names of the two input ffdf's are not duplicate:

    library(ff)
    ff1 <- as.ffdf(data.frame(letA = letters[1:5], numA = 1:5))
    ff2 <- as.ffdf(data.frame(letB = letters[6:10], numB = 6:10))
    
    cbind.ffdf2 <- function(d1, d2){
      D1names <- colnames(d1)
      D2names <- colnames(d2)
      mergeCall <- do.call("ffdf", c(physical(d1), physical(d2)))
      colnames(mergeCall) <- c(D1names, D2names)
      mergeCall
    }
    
    cbind.ffdf2(ff1, ff2)[,]
    

    Result:

       letA numA letB numB
    1   a    1    f     6
    2   b    2    g     7
    3   c    3    h     8
    4   d    4    i     9
    5   e    5    j    10