I am writing some exercises using r-exams and I went in to this problem: I am making some simple linear model fitting using a mean centered covariate, here is the code:
## DATA GENERATION
set.seed(123)
n<-rpois(1,120)
age<-runif(n,0,25)
m_age<-round(mean(age),4)
wght<-100+.8*age+rnorm(n,0,4)
z0_aw<-data.frame(age,weight=wght)
m0<-lm(weight~I(age-m_age),data=z0_aw)
summary(m0)
#>
#> Call:
#> lm(formula = weight ~ I(age - m_age), data = z0_aw)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -9.2226 -2.4603 -0.2445 2.3565 13.0428
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 110.02678 0.36571 300.86 <2e-16 ***
#> I(age - m_age) 0.78432 0.05069 15.47 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 3.888 on 111 degrees of freedom
#> Multiple R-squared: 0.6832, Adjusted R-squared: 0.6804
#> F-statistic: 239.4 on 1 and 111 DF, p-value: < 2.2e-16
m1<-lm(weight~I(age-12.679),data=z0_aw)
summary(m1)
#>
#> Call:
#> lm(formula = weight ~ I(age - 12.679), data = z0_aw)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -9.2226 -2.4603 -0.2445 2.3565 13.0428
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 110.02678 0.36571 300.86 <2e-16 ***
#> I(age - 12.679) 0.78432 0.05069 15.47 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 3.888 on 111 degrees of freedom
#> Multiple R-squared: 0.6832, Adjusted R-squared: 0.6804
#> F-statistic: 239.4 on 1 and 111 DF, p-value: < 2.2e-16
Created on 2021-02-14 by the reprex package (v0.3.0)
As you can see, output of call in summary of model m0
is
lm(formula = weight ~ I(age - m_age), data = z0_aw)
I want to obtain the same output as in model m1
, that is
lm(formula = weight ~ I(age - 12.679), data = z0_aw)
However, the usage of object m_age is absolutely needed since the exercise is generated at random to avoid fraud in exams. I just tried something like lm(weight~I(age-eval(m_age)),data=z0_aw)
but the output in call of the summary is lm(formula = weight ~ I(age - eval(m_age)), data = z0_aw)
For me is very important to achieve the output lm(formula = weight ~ I(age - 12.679), data = z0_aw)
, since it will be used in some questions.
Rather than just eval(m_age)
I would build up the call for the entire lm()
and then only substitute m_age
with its value. You can do so by:
cl <- call("lm", formula = weight ~ I(age - m_age), data = as.name("z0_aw"))
cl$formula[[3]][[2]][[3]] <- m_age
cl
## lm(formula = weight ~ I(age - 12.679), data = z0_aw)
As a brief explanation:
call()
to the function named "lm"
is constructed with two arguments formula =
and data =
. The formula is symbolic anyway and hence can be evaluated and the data name "z0_aw"
is coerced to a "symbol" or "name" rather than being evaluated to a data.frame
.cl$formula
we then replace the symbol/name m_age
with the numeric value 12.679
. This is in the right-hand side of the formula (3rd element), in the argument of I()
(2nd element), in the right-hand side of the difference (3rd element).Finally, evaluating that call yields the desired lm
object:
m <- eval(cl)
summary(m)
## Call:
## lm(formula = weight ~ I(age - 12.679), data = z0_aw)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.2226 -2.4603 -0.2445 2.3565 13.0428
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 110.02678 0.36571 300.86 <2e-16 ***
## I(age - 12.679) 0.78432 0.05069 15.47 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.888 on 111 degrees of freedom
## Multiple R-squared: 0.6832, Adjusted R-squared: 0.6804
## F-statistic: 239.4 on 1 and 111 DF, p-value: < 2.2e-16