Search code examples
rlme4tapplyrandom-effects

How to back-transform with a continuous variable


I would like to know how to properly back-transform the output from a univariate linear mixed effects model in order to interpret it. I have not posted data to go along with my question because my question should be answerable without data.

My model (simplified for the purposes of this question):

library(lme4)
m1<-lmer(activity ~ sex + BirthDate+ (1|id), data=merge.data)

> m1
Linear mixed model fit by REML ['lmerMod']
Formula: activity ~ sex + BirthDate + (1 | id)
   Data: merge.data
REML criterion at convergence: 572.0483
Random effects:
 Groups   Name        Std.Dev.
 id    (Intercept) 0.7194  
 Residual             1.4651  
    Number of obs: 150, groups:  id, 89
    Fixed Effects:
   (Intercept)            sexM       BirthDate  
      -0.08661         0.20718         0.43022  

Where:

  • activity is a continuous response variable
  • sex is a categorical variable with 2 levels (female and male)
  • BirthDate is a continuous variable; BirthDate is the number of days since January 1st and then it is mean centred and standardized to one standard deviation
  • id is a random effect for individual identity
  • merge.data is the name of my dataset

Before BirthDate is mean centred and standardized to one standard deviation:

> summary(merge.data$BirthDate)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  94.96  115.96  121.96  122.67  127.96  138.96 

After BirthDate is mean centred and standardized to one standard deviation:

merge.data<-merge.data %>%
    mutate(BirthDate = ((BirthDate-mean(BirthDate))/(1*(sd(BirthDate)))))

> summary(merge.data$BirthDate)
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
-3.09082 -0.74816 -0.07883  0.00000  0.59050  1.81761 

I would like to know what the mean value is for both sex and BirthDate. Based on reading The R Book by Crawley, I can get the mean from my model m1 with the following code:

tapply(predict(m1,type="response"), merge.data$sex,mean) #gives you the back-transformed mean for sex from the model "m1"

 F           M 
-0.08334649  0.11199685

Which says that the mean activity score for females is -0.083 and males is 0.11.

When I try this for BirthDate, like so:

 tapply(predict(m1,type="response"), merge.data$BirthDate,mean)

  -3.09082367412411    -1.6406056364576   -1.52905040279094 #mean centered birth date
        -0.79030344         -0.87012920         -0.44792213 #activity score

and so on...

What I end up getting is 1 mean value for every birth date (BirthDate is mean centred and standardized to one standard deviation). Unlike with sex, I can't really do anything with that information... I am trying to present the effect (effect size) of increasing birth date on activity.

What I would like to ultimately do is say that for every 1 day increase in birth date, there is an [number from model] increase in activity score.


Solution

  • When you print out the model by typing m1, this part:

        Fixed Effects:
       (Intercept)            sexM       BirthDate  
          -0.08661         0.20718         0.43022  
    

    tells you the slopes, i.e. how much the result will change based on a change in the inputs. In particular, if you increase BirthDate by one (and keep everything else the same), the predicted activity score will increase by 0.43022.

    You do not provide any data so I cannot work directly with your data and your model. Instead, I will illustrate with some data built into R, the iris data.

    ## Build a linear model
    Mod1 = lm(Petal.Length ~ ., data=iris[,1:4])
    

    Now we could just type Mod1, but that gives more than I care to see. We can restrict our attention to the interesting part using

    Mod1$coefficients
     (Intercept) Sepal.Length  Sepal.Width  Petal.Width 
      -0.2627112    0.7291384   -0.6460124    1.4467934
    

    This gives the slope for each of the predictor variables (and the intercept). I want to illustrate how the response Petal.Length varies with the inputs. I will just take some point and change one predictor and look at the result.

    NewPoint = iris[30,1:4]
    NewPoint[,1] = NewPoint[,1]+1
    iris[30, 1:4]
       Sepal.Length Sepal.Width Petal.Length Petal.Width
    30          4.7         3.2          1.6         0.2
    NewPoint
       Sepal.Length Sepal.Width Petal.Length Petal.Width
    30          5.7         3.2          1.6         0.2
    

    You can see that NewPoint is the same as the original point iris[30,1:4] except that Sepal.Length has been increased by 1. How does that affect the prediction?

    predict(Mod1, newdata=iris[30,1:4])
          30 
    1.386358 
    predict(Mod1, newdata=NewPoint)
          30 
    2.115497 
    predict(Mod1, newdata=NewPoint) - predict(Mod1, newdata=iris[30,1:4])
           30 
    0.7291384
    

    The difference in the predicted values is 0.7291384, which is the coefficient for Sepal.Length shown above.