I am trying to extract Bayesian p-values (i.e., the proportion of estimates that are >0 if the point estimate is negative, or where the point estimate is positive, the proportion of estimates that is < 0)) from a lmer
model I have made. I understand "p-values" are inherently frequentist, but I require the Bayesian p-values to appease a Reviewer (similar to this user).
For the purposes of reproducibility, I am using a dataset from R to illustrate my question. The dataset:
library(datasets)
data(ChickWeight) #importing data from base R
summary(ChickWeight)
weight Time Chick Diet
Min. : 35.0 Min. : 0.00 13 : 12 1:220
1st Qu.: 63.0 1st Qu.: 4.00 9 : 12 2:120
Median :103.0 Median :10.00 20 : 12 3:120
Mean :121.8 Mean :10.72 10 : 12 4:118
3rd Qu.:163.8 3rd Qu.:16.00 17 : 12
Max. :373.0 Max. :21.00 19 : 12
(Other):506
My real data has both continuous and discrete predictor variables and a random effect for individual identity.
Creating the lmer
model:
install.packages("lme4", dependencies=TRUE)
library(lme4)
m1<-lmer(weight ~ Time + Diet+ (1|Chick), data=ChickWeight)
summary(m1)
Linear mixed model fit by REML ['lmerMod']
Formula: weight ~ Time + Diet + (1 | Chick)
Data: ChickWeight
REML criterion at convergence: 5584
Scaled residuals:
Min 1Q Median 3Q Max
-3.0591 -0.5779 -0.1182 0.4962 3.4515
Random effects:
Groups Name Variance Std.Dev.
Chick (Intercept) 525.4 22.92
Residual 799.4 28.27
Number of obs: 578, groups: Chick, 50
Fixed effects:
Estimate Std. Error t value
(Intercept) 11.2438 5.7887 1.942
Time 8.7172 0.1755 49.684
Diet2 16.2100 9.4643 1.713
Diet3 36.5433 9.4643 3.861
Diet4 30.0129 9.4708 3.169
Correlation of Fixed Effects:
(Intr) Time Diet2 Diet3
Time -0.307
Diet2 -0.550 -0.015
Diet3 -0.550 -0.015 0.339
Diet4 -0.550 -0.011 0.339 0.339
Unlike the ChickWeight
dataset, my real dataset has estimates that are both positive and negative.
I then want to extract the 95% credible intervals from my model m1
:
install.packages(c("MCMCglmm", "arm"), dependencies=TRUE)
library(MCMCglmm)
library(arm)
sm1<-sim(m1,1000)
smfixef=sm1@fixef #fixed effects
smranef=sm1@ranef #random effects
smfixef=as.mcmc(smfixef)
posterior.mode(smfixef) #extract estimates for fixed effects
(Intercept) Time Diet2 Diet3 Diet4
10.489143 8.800899 16.761983 31.684341 28.037318
HPDinterval(smfixef) ##extract 95% credible intervals for fixed effects
lower upper
(Intercept) -0.05392775 21.960966
Time 8.38244319 9.064171
Diet2 -0.46587564 34.061686
Diet3 17.90445947 53.817409
Diet4 11.17259787 48.467258
attr(,"Probability")
[1] 0.95
Now I want to get the Bayesian p-value:
install.packages("conting", dependencies=TRUE)
library(conting)
bayespval(object=sm1, n.burnin = 0, thin = 1, statistic = "X2")
#this last line is the line I am having trouble with
Error: $ operator not defined for this S4 class
With how I set up my model m1
, what is the correct format to extract the Bayesian p-values for each estimate?
There is an example posted with the original package/code, but my model is not set up as their model is.
I do not need to use this package and would be just as happy to calculate it from my 1000 simulations. In that case, I would need to know how to count how many of the estimates are below/above zero. That number / 1000 (total number of estimates) would be the Bayesian p-value.
To extract Bayesian p-values (i.e, the proportion of estimates that are >0 if the point estimate is negative, or where the point estimate is positive, the proportion of estimates that is < 0) you can extract the point estimates for each simulation and then divide by the number of simulations.
To do this using the ChickWeight
dataset and the above model, you then:
library(datasets)
data(ChickWeight)
m1<-lmer(weight ~ Time + Diet+ (1|Chick), data=ChickWeight)
sm1<-sim(m1,1000)
smfixef=sm1@fixef
smfixef=as.mcmc(smfixef) #this has the 1000 simulations in it for the fixed effects
as.mcmc(smfixef)
Markov Chain Monte Carlo (MCMC) output:
Start = 1
End = 1000
Thinning interval = 1
(Intercept) Time Diet2 Diet3 Diet4
[1,] 17.52609243 8.381517 7.47169881 46.442343 19.7164997 #simulation 1
[2,] 16.52854430 8.859378 8.83279931 29.017547 25.4610474 #simulation 2
[3,] 4.00702870 8.830302 29.68309621 47.459395 35.1939344 #simulation 3
[4,] 16.44162722 8.599929 15.87393285 31.946265 33.7513144 #simulation 4
[5,] 21.07173579 8.596701 1.81909415 28.934133 19.0499998 #simulation 5
etc.
Then for each column you can code which simulations were above or below zero:
p_Time=if_else(smfixef[,2]>0, 1,0) #Time variable (i.e., 2nd column)
Because the point estimate for the Time
variable is positive, you want to count the number of times an estimate for that variable was below zero:
sum_p_Time=sum(p_Time<1)
> sum_p_Time
0
In this case it says all the estimates are above zero, therefore the Bayesian p-value is < 0.001. This supports what we see when we look at only the point estimate and 95 % credible intervals (i.e. Time
estimate is 8.80 and 95 % credible intervals are (8.38, 9.06). In both cases we see there is strong support for Time
having an effect on weight
.