I would like to know how to properly back-transform the output from a univariate linear mixed effects model in order to interpret it. I have not posted data to go along with my question because my question should be answerable without data.
My model (simplified for the purposes of this question):
library(lme4)
m1<-lmer(activity ~ sex + BirthDate+ (1|id), data=merge.data)
> m1
Linear mixed model fit by REML ['lmerMod']
Formula: activity ~ sex + BirthDate + (1 | id)
Data: merge.data
REML criterion at convergence: 572.0483
Random effects:
Groups Name Std.Dev.
id (Intercept) 0.7194
Residual 1.4651
Number of obs: 150, groups: id, 89
Fixed Effects:
(Intercept) sexM BirthDate
-0.08661 0.20718 0.43022
Where:
activity
is a continuous response variablesex
is a categorical variable with 2 levels (female and male)BirthDate
is a continuous variable; BirthDate
is the number of days since January 1st and then it is mean centred and standardized to one standard deviationid
is a random effect for individual identitymerge.data
is the name of my datasetBefore BirthDate
is mean centred and standardized to one standard deviation:
> summary(merge.data$BirthDate)
Min. 1st Qu. Median Mean 3rd Qu. Max.
94.96 115.96 121.96 122.67 127.96 138.96
After BirthDate
is mean centred and standardized to one standard deviation:
merge.data<-merge.data %>%
mutate(BirthDate = ((BirthDate-mean(BirthDate))/(1*(sd(BirthDate)))))
> summary(merge.data$BirthDate)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-3.09082 -0.74816 -0.07883 0.00000 0.59050 1.81761
I would like to know what the mean value is for both sex
and BirthDate
. Based on reading The R Book by Crawley, I can get the mean from my model m1
with the following code:
tapply(predict(m1,type="response"), merge.data$sex,mean) #gives you the back-transformed mean for sex from the model "m1"
F M
-0.08334649 0.11199685
Which says that the mean activity score for females is -0.083 and males is 0.11.
When I try this for BirthDate
, like so:
tapply(predict(m1,type="response"), merge.data$BirthDate,mean)
-3.09082367412411 -1.6406056364576 -1.52905040279094 #mean centered birth date
-0.79030344 -0.87012920 -0.44792213 #activity score
and so on...
What I end up getting is 1 mean value for every birth date (BirthDate
is mean centred and standardized to one standard deviation). Unlike with sex
, I can't really do anything with that information... I am trying to present the effect (effect size) of increasing birth date on activity.
What I would like to ultimately do is say that for every 1 day increase in birth date, there is an [number from model] increase in activity score.
When you print out the model by typing m1
, this part:
Fixed Effects:
(Intercept) sexM BirthDate
-0.08661 0.20718 0.43022
tells you the slopes, i.e. how much the result will change based on a change in the inputs. In particular, if you increase BirthDate by one (and keep everything else the same), the predicted activity score will increase by 0.43022.
You do not provide any data so I cannot work directly with your data and your model. Instead, I will illustrate with some data built into R, the iris data.
## Build a linear model
Mod1 = lm(Petal.Length ~ ., data=iris[,1:4])
Now we could just type Mod1
, but that gives more than I care to see. We can restrict our attention to the interesting part using
Mod1$coefficients
(Intercept) Sepal.Length Sepal.Width Petal.Width
-0.2627112 0.7291384 -0.6460124 1.4467934
This gives the slope for each of the predictor variables (and the intercept).
I want to illustrate how the response Petal.Length
varies with the inputs.
I will just take some point and change one predictor and look at the result.
NewPoint = iris[30,1:4]
NewPoint[,1] = NewPoint[,1]+1
iris[30, 1:4]
Sepal.Length Sepal.Width Petal.Length Petal.Width
30 4.7 3.2 1.6 0.2
NewPoint
Sepal.Length Sepal.Width Petal.Length Petal.Width
30 5.7 3.2 1.6 0.2
You can see that NewPoint
is the same as the original point iris[30,1:4]
except that Sepal.Length has been increased by 1. How does that affect the prediction?
predict(Mod1, newdata=iris[30,1:4])
30
1.386358
predict(Mod1, newdata=NewPoint)
30
2.115497
predict(Mod1, newdata=NewPoint) - predict(Mod1, newdata=iris[30,1:4])
30
0.7291384
The difference in the predicted values is 0.7291384, which is the coefficient for Sepal.Length shown above.