Search code examples
rglmrevoscaler

Shortening Length of Function Calls in R - revoScaleR rxGLM()


I'm using R to create some GLM models on a large data set at the moment. Because of its size I'm using the rxGlm() function in the revoScaleR package - it runs a lot faster than the basic glm() function.

I'm keeping all of the function calls in an R script so that I can reproduce my work later - audit trail, etc.

My function calls are very long because I have a lot of factors (~50). They all look something like this:

rxGlm_C <- rxGlm(Dependent.Variable ~
               1 +
               Factor 1 +
               Factor 2 +
               Factor 3 +
                     ...........
               Factor N,
             family = tweedie(var.power = 1.5, link.power = 0),
             data = myDataFrame,
             pweights = "Weight.Variable",
)

If, afterwards, I want to rerun the model fit but perhaps with just a slight change to the formula - typically removing a single factor at a time - is there any shorthand notation for this? At the moment I'm copying and pasting the function call into my script file and manually deleting single rows. Is there instead some kind of syntax that says:

"please fit the exact same GLM as last time, but remove Factor 13"?

It would make my script files an awful lot shorter. I've got about 3,000 lines of code in there at the moment and I'm not finished yet!

Thanks. Alan


Solution

  • There are two cases. If you are using all the variables from myDataFrame, then you may simply write

    rxGlm(Dependent.Variable ~ .,
          family = tweedie(var.power = 1.5, link.power = 0),
          data = myDataFrame, pweights = "Weight.Variable")
    

    for the full model and then, say,

    rxGlm(Dependent.Variable ~ . - Factor13,
          family = tweedie(var.power = 1.5, link.power = 0),
          data = myDataFrame, pweights = "Weight.Variable")
    

    to drop Factor13.

    If you are not using all the variables, then you could save your full formula, say,

    frml <- y ~ Factor1 + Factor2 + Facto3
    

    and then use update:

    update(frml, ~ . - Factor3)
    # y ~ Factor1 + Factor2
    

    Note, though, that in this case . means "the same right hand side as in frml", rather than "all the variables" as in the former option.

    Also, if it's the latter option, you may facilitate constructing the full formula with paste and formula.