Search code examples
rdata.tableffffbase

Data.table setDT functionality in ff/ffbase R packages


Calculate column of conditional means, in ff/ffbase packages. I'm searching for functionality in ff/ffbase packages, which allow me for data manipulation similar to carried below with data.table package :

library(data.table)
irisdf <- as.data.table(iris)
class(irisdf)
#"ffdf"
irisdf[,  NewMean:= mean(Sepal.Length), Species] 

There is a function for conditional mean in ffbase, but, that creates vector of length of number of classes in irisdf[,5]:

condMean(x = irisdf[,1], index = irisdf[,5], na.rm = FALSE)

, and not new vector of length of nrow(irisdf).

As @BondedDust suggested ave(base) gives right output :

VectorOfMeans <- ave(irisdf[,1], irisdf[,5], FUN=mean)

so the final question is, how to add VectorOfMeans to irisdf, I've tried below code, which works :

irisdf=as.ffdf(iris)
VectorOfMeans <- as.ffdf(as.ff(ave(irisdf[,1], irisdf[,5], FUN=mean)))
irisdf <- cbind.ffdf2(df,VectorOfMeans )

with cbind.ffdf2 from SO answer, but I suppose, that SO question was about something more specific then main, and I suppose there is an easier(faster) way to do that. I would like to be able run bigglm.ff on obtained dataset (irisdf in example), you should look at my question about merging VectorOfMeans and irisdf in this context (as there are issues with physical/virtual modes of storage which I don't understand in details).


Solution

  • Perhaps this helps

    library(data.table)
    library(ffbase)
    x1 <- as.ffdf(iris)
    fd1 <- ffdfdply(x1, split=as.character(x1$Species), FUN=function(x) {
     x2 <- as.data.table(x)
     res <- x2[, NewMean:= mean(Sepal.Length), Species]
     as.data.frame(res)
    }, trace=T)