Search code examples
rprintinglmsummaryr-exams

Mean centered covariate in lm and summary printing showing actual mean value


I am writing some exercises using r-exams and I went in to this problem: I am making some simple linear model fitting using a mean centered covariate, here is the code:

## DATA GENERATION
set.seed(123)
n<-rpois(1,120)
age<-runif(n,0,25)
m_age<-round(mean(age),4)
wght<-100+.8*age+rnorm(n,0,4)

z0_aw<-data.frame(age,weight=wght)

m0<-lm(weight~I(age-m_age),data=z0_aw)
summary(m0)
#> 
#> Call:
#> lm(formula = weight ~ I(age - m_age), data = z0_aw)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -9.2226 -2.4603 -0.2445  2.3565 13.0428 
#> 
#> Coefficients:
#>                 Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)    110.02678    0.36571  300.86   <2e-16 ***
#> I(age - m_age)   0.78432    0.05069   15.47   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 3.888 on 111 degrees of freedom
#> Multiple R-squared:  0.6832, Adjusted R-squared:  0.6804 
#> F-statistic: 239.4 on 1 and 111 DF,  p-value: < 2.2e-16

m1<-lm(weight~I(age-12.679),data=z0_aw)
summary(m1)
#> 
#> Call:
#> lm(formula = weight ~ I(age - 12.679), data = z0_aw)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -9.2226 -2.4603 -0.2445  2.3565 13.0428 
#> 
#> Coefficients:
#>                  Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)     110.02678    0.36571  300.86   <2e-16 ***
#> I(age - 12.679)   0.78432    0.05069   15.47   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 3.888 on 111 degrees of freedom
#> Multiple R-squared:  0.6832, Adjusted R-squared:  0.6804 
#> F-statistic: 239.4 on 1 and 111 DF,  p-value: < 2.2e-16

Created on 2021-02-14 by the reprex package (v0.3.0)

As you can see, output of call in summary of model m0 is lm(formula = weight ~ I(age - m_age), data = z0_aw)

I want to obtain the same output as in model m1, that is lm(formula = weight ~ I(age - 12.679), data = z0_aw)

However, the usage of object m_age is absolutely needed since the exercise is generated at random to avoid fraud in exams. I just tried something like lm(weight~I(age-eval(m_age)),data=z0_aw) but the output in call of the summary is lm(formula = weight ~ I(age - eval(m_age)), data = z0_aw)

For me is very important to achieve the output lm(formula = weight ~ I(age - 12.679), data = z0_aw), since it will be used in some questions.


Solution

  • Rather than just eval(m_age) I would build up the call for the entire lm() and then only substitute m_age with its value. You can do so by:

    cl <- call("lm", formula = weight ~ I(age - m_age), data = as.name("z0_aw"))
    cl$formula[[3]][[2]][[3]] <- m_age
    cl
    ## lm(formula = weight ~ I(age - 12.679), data = z0_aw)
    

    As a brief explanation:

    • The call() to the function named "lm" is constructed with two arguments formula = and data =. The formula is symbolic anyway and hence can be evaluated and the data name "z0_aw" is coerced to a "symbol" or "name" rather than being evaluated to a data.frame.
    • In cl$formula we then replace the symbol/name m_age with the numeric value 12.679. This is in the right-hand side of the formula (3rd element), in the argument of I() (2nd element), in the right-hand side of the difference (3rd element).

    Finally, evaluating that call yields the desired lm object:

    m <- eval(cl)
    summary(m)
    ## Call:
    ## lm(formula = weight ~ I(age - 12.679), data = z0_aw)
    ## 
    ## Residuals:
    ##     Min      1Q  Median      3Q     Max 
    ## -9.2226 -2.4603 -0.2445  2.3565 13.0428 
    ## 
    ## Coefficients:
    ##                  Estimate Std. Error t value Pr(>|t|)    
    ## (Intercept)     110.02678    0.36571  300.86   <2e-16 ***
    ## I(age - 12.679)   0.78432    0.05069   15.47   <2e-16 ***
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ## Residual standard error: 3.888 on 111 degrees of freedom
    ## Multiple R-squared:  0.6832, Adjusted R-squared:  0.6804 
    ## F-statistic: 239.4 on 1 and 111 DF,  p-value: < 2.2e-16