Calculate column of conditional means, in ff/ffbase
packages.
I'm searching for functionality in ff/ffbase
packages, which allow me for data manipulation similar to carried below with data.table
package :
library(data.table)
irisdf <- as.data.table(iris)
class(irisdf)
#"ffdf"
irisdf[, NewMean:= mean(Sepal.Length), Species]
There is a function for conditional mean in ffbase
, but, that creates vector of length of number of classes in irisdf[,5]
:
condMean(x = irisdf[,1], index = irisdf[,5], na.rm = FALSE)
, and not new vector of length of nrow(irisdf)
.
As @BondedDust suggested ave(base)
gives right output :
VectorOfMeans <- ave(irisdf[,1], irisdf[,5], FUN=mean)
so the final question is, how to add VectorOfMeans
to irisdf
, I've tried below code, which works :
irisdf=as.ffdf(iris)
VectorOfMeans <- as.ffdf(as.ff(ave(irisdf[,1], irisdf[,5], FUN=mean)))
irisdf <- cbind.ffdf2(df,VectorOfMeans )
with cbind.ffdf2 from SO answer, but I suppose, that SO question was about something more specific then main, and I suppose there is an easier(faster) way to do that. I would like to be able run bigglm.ff
on obtained dataset (irisdf
in example), you should look at my question about merging VectorOfMeans
and irisdf
in this context (as there are issues with physical/virtual modes of storage which I don't understand in details).
Perhaps this helps
library(data.table)
library(ffbase)
x1 <- as.ffdf(iris)
fd1 <- ffdfdply(x1, split=as.character(x1$Species), FUN=function(x) {
x2 <- as.data.table(x)
res <- x2[, NewMean:= mean(Sepal.Length), Species]
as.data.frame(res)
}, trace=T)