Search code examples
rmulti-levelnlme

How to add level2 predictors in multilevel regression (package nlme)


I have a question concerning multi level regression models in R, specifically how to add predictors for my level 2 "measure".

Please consider the following example (this is not a real dataset, so the values might not make much sense in reality):

date        id    count    bmi    poll
2012-08-05  1     3        20.5   1500
2012-08-06  1     2        20.5   1400
2012-08-05  2     0        23     1500
2012-08-06  2     3        23     1400

The data contains

  • different persons ("id"...so it's two persons)
  • the body mass index of each person ("bmi", so it doesn't vary within an id)
  • the number of heart problems each person has on a specific day ("count). So person 1 had three problems on August the 5th, whereas person 2 had no difficulties/problems on that day
  • the amount of pollutants (like Ozon or sulfit dioxide) which have been measured on that given day

My general research question is, if the amount of pollutants effects the numer of heart problems in the population. In a first step, this could be a simple linear regression: lm(count ~ poll)

However, my data for each day is so to say clustered within persons. I have two measures from person 1 and two measures from person 2.

So my basic idea was to set up a multilevel model with persons (id) as my level 2 variable.

I used the nlme package for this analysis:

lme(fixed=count ~ poll, random = ~poll|id, ...)

No problems so far.

However, the true influence on level 2 might not only come from the fact that I have different persons. Rather it would be much more likely that the effect WITHIN a person might come from his or her bmi (and many other person related variables, like age, amount of smoking and so on).

To make a longstory short:

How can I specify such level two predictors in the lme function?

Or in other words: How can I setup a model, where the relationship between heart problems and pollution is different/clustered/moderated by the body mass index of a person (and as I said maybe additionally by this person's amount of smoking or age)

Unfortunately, I don't have a clue, how to tell R, what I want. I know oif other software (one of them called HLM), which is capable of doing waht I want, but I'm quite sure that R can this as well...

So, many thanks for any help!

deschen


Solution

  • Short answer: you do not have to, as long as you correctly specify random effects. The lme function automatically detects which variables are level 1 or 2. Consider this example using Oxboys where each subject was measured 9 times. For the time being, let me use lmer in the lme4 package.

    library(nlme)
    library(dplyr)
    library(lme4)
    library(lmerTest)
    
    Oxboys %>%                                                #1
      filter(as.numeric(Subject)<25) %>%                      #2
      mutate(Group=rep(LETTERS[1:3], each=72)) %>%            #3
      lmer(height ~ Occasion*Group + (1|Subject), data=.) %>% #4     
      anova()                                                 #5  
    

    Here I am picking 24 subjects (#2) and arranging them into 3 groups (#3) to make this data balanced. Now the design of this study is a split-plot design with a repeated-measures factor (Occasion) with q=9 levels and a between-subject factor (Group) with p=3 levels. Each group has n=8 subjects. Occasion is a level-1 variable while Group is level 2.

    In #4, I did not specify which variable is level 1 or 2, but lmer gives you correct output. How do I know it is correct? Let us check the multi-level model's degrees of freedom for the fixed effects. If your data is balanced, the Kenward–Roger approximation used in the lmerTest will give you exact dfs and F/t-ratios according to this article. That is, in this example dfs for the test of Group, Occasion, and their interaction should be p-1=2, q-1=8, and (p-1)*(q-1)=16, respectively. The df for the Subject error term is (n-1)p = 21 and the df for the Subject:Occasion error term is p(n-1)(q-1)=168. In fact, these are the "exact" values we get from the anova output (#5).

    I do not know what algorithm lme uses for approximating dfs, but lme does give you the same dfs. So I am assuming that it is accurate.