Search code examples
rbayesianlme4p-value

Extracting the Bayesian p-value from lmer model in R


I am trying to extract Bayesian p-values (i.e., the proportion of estimates that are >0 if the point estimate is negative, or where the point estimate is positive, the proportion of estimates that is < 0)) from a lmer model I have made. I understand "p-values" are inherently frequentist, but I require the Bayesian p-values to appease a Reviewer (similar to this user).

For the purposes of reproducibility, I am using a dataset from R to illustrate my question. The dataset:

library(datasets)
data(ChickWeight) #importing data from base R
summary(ChickWeight)

 weight           Time           Chick         Diet   
 Min.   : 35.0   Min.   : 0.00   13     : 12   1:220  
 1st Qu.: 63.0   1st Qu.: 4.00   9      : 12   2:120  
 Median :103.0   Median :10.00   20     : 12   3:120  
 Mean   :121.8   Mean   :10.72   10     : 12   4:118  
 3rd Qu.:163.8   3rd Qu.:16.00   17     : 12          
 Max.   :373.0   Max.   :21.00   19     : 12          
                                 (Other):506          

My real data has both continuous and discrete predictor variables and a random effect for individual identity.

Creating the lmer model:

install.packages("lme4", dependencies=TRUE)
library(lme4)

m1<-lmer(weight ~ Time + Diet+ (1|Chick), data=ChickWeight)
summary(m1)

Linear mixed model fit by REML ['lmerMod']
Formula: weight ~ Time + Diet + (1 | Chick)
    Data: ChickWeight

REML criterion at convergence: 5584

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.0591 -0.5779 -0.1182  0.4962  3.4515 

Random effects:
 Groups   Name        Variance Std.Dev.
 Chick    (Intercept) 525.4    22.92   
 Residual             799.4    28.27   
Number of obs: 578, groups:  Chick, 50

Fixed effects:
            Estimate Std. Error t value
(Intercept)  11.2438     5.7887   1.942
Time          8.7172     0.1755  49.684
Diet2        16.2100     9.4643   1.713
Diet3        36.5433     9.4643   3.861
Diet4        30.0129     9.4708   3.169

Correlation of Fixed Effects:
      (Intr) Time   Diet2  Diet3 
Time  -0.307                     
Diet2 -0.550 -0.015              
Diet3 -0.550 -0.015  0.339       
Diet4 -0.550 -0.011  0.339  0.339

Unlike the ChickWeight dataset, my real dataset has estimates that are both positive and negative.

I then want to extract the 95% credible intervals from my model m1:

install.packages(c("MCMCglmm", "arm"), dependencies=TRUE)    
library(MCMCglmm)
library(arm)

sm1<-sim(m1,1000)
smfixef=sm1@fixef #fixed effects
smranef=sm1@ranef #random effects
smfixef=as.mcmc(smfixef) 

posterior.mode(smfixef) #extract estimates for fixed effects
(Intercept)        Time       Diet2       Diet3       Diet4 
  10.489143    8.800899   16.761983   31.684341   28.037318 

HPDinterval(smfixef) ##extract 95% credible intervals for fixed effects
                  lower     upper
(Intercept) -0.05392775 21.960966
Time         8.38244319  9.064171
Diet2       -0.46587564 34.061686
Diet3       17.90445947 53.817409
Diet4       11.17259787 48.467258
attr(,"Probability")
[1] 0.95

Now I want to get the Bayesian p-value:

install.packages("conting", dependencies=TRUE)
library(conting)
bayespval(object=sm1, n.burnin = 0, thin = 1, statistic = "X2") 
#this last line is the line I am having trouble with

Error: $ operator not defined for this S4 class

With how I set up my model m1, what is the correct format to extract the Bayesian p-values for each estimate?

There is an example posted with the original package/code, but my model is not set up as their model is.

I do not need to use this package and would be just as happy to calculate it from my 1000 simulations. In that case, I would need to know how to count how many of the estimates are below/above zero. That number / 1000 (total number of estimates) would be the Bayesian p-value.


Solution

  • To extract Bayesian p-values (i.e, the proportion of estimates that are >0 if the point estimate is negative, or where the point estimate is positive, the proportion of estimates that is < 0) you can extract the point estimates for each simulation and then divide by the number of simulations.

    To do this using the ChickWeight dataset and the above model, you then:

    library(datasets)
    data(ChickWeight)
    
    m1<-lmer(weight ~ Time + Diet+ (1|Chick), data=ChickWeight)
    
    sm1<-sim(m1,1000)
    smfixef=sm1@fixef
    smfixef=as.mcmc(smfixef) #this has the 1000 simulations in it for the fixed effects 
    
    as.mcmc(smfixef)
    Markov Chain Monte Carlo (MCMC) output:
    Start = 1 
    End = 1000 
    Thinning interval = 1 
            (Intercept)     Time        Diet2     Diet3      Diet4
       [1,] 17.52609243 8.381517   7.47169881 46.442343 19.7164997 #simulation 1
       [2,] 16.52854430 8.859378   8.83279931 29.017547 25.4610474 #simulation 2
       [3,]  4.00702870 8.830302  29.68309621 47.459395 35.1939344 #simulation 3
       [4,] 16.44162722 8.599929  15.87393285 31.946265 33.7513144 #simulation 4
       [5,] 21.07173579 8.596701   1.81909415 28.934133 19.0499998 #simulation 5
    etc.
    

    Then for each column you can code which simulations were above or below zero:

    p_Time=if_else(smfixef[,2]>0, 1,0) #Time variable (i.e., 2nd column)
    

    Because the point estimate for the Time variable is positive, you want to count the number of times an estimate for that variable was below zero:

    sum_p_Time=sum(p_Time<1)
    > sum_p_Time
    0 
    

    In this case it says all the estimates are above zero, therefore the Bayesian p-value is < 0.001. This supports what we see when we look at only the point estimate and 95 % credible intervals (i.e. Time estimate is 8.80 and 95 % credible intervals are (8.38, 9.06). In both cases we see there is strong support for Time having an effect on weight.