Search code examples
microsoft-r

Replace existng column in MSR


Why does the following MSR code not replace the original column "Var1"?

rxDataStep(inData = input_xdf, outFile = input_xdf, overwrite = TRUE,
       transforms = list(Var1 = as.numeric(Var1)),
       transformVars = c("Var1")
       )

Solution

  • At the moment, RevoScaleR doesn't support changing the type of a variable in an xdf file (even if you write to a different file). The way to do it is to create a new variable, drop the old, and then rename the new variable to the old name.

    I would suggest doing this with a transformFunc (see ?rxTransform for more information), so that you can create the new variable and drop the old, all in one step:

    rxDataStep(inXdf, outXdf, transformFunc=function(varlst) {
        varlst$Var1b <- as.numeric(varlst$Var1)
        varlst$Var1 <- NULL
        varlst
    }, transformVars="Var1")
    
    # this is fast as it only modifies the xdf metadata, not the data itself
    names(outXdf)[names(outXdf) == "Var1b"] <- "Var1"