Search code examples
rr2winbugswinbugs14

R2WinBUGS - logistic regression with simulated data


I am just wondering whether anyone has some R code that uses the package R2WinBUGS to run logistic regression - ideally with simulated data to generate the 'truth' and two continous co-variates.

Thanks.

Christian

PS:

Potential code to generate artificial data (one dimensional case) and run winbugs via r2winbugs (it does not work yet).

library(MASS)
library(R2WinBUGS)

setwd("d:/BayesianLogisticRegression")

n.site <- 150

X1<- sort(runif(n = n.site, min = -1, max =1))

xb <- 0.0 + 3.0*X1 

occ.prob <- 1/(1+exp(-xb))

plot(X1, occ.prob,xlab="X1",ylab="occ.prob")

true.presence <- rbinom(n = n.site, size = 1, prob = occ.prob)

plot(X1, true.presence,xlab="X1",ylab="true.presence")

# combine data as data frame and save
data <- data.frame(X1, true.presence)
write.matrix(data, file = "data.txt", sep = "\t")

sink("model.txt")
cat("
model {

# Priors
 alpha ~ dnorm(0,0.01)
 beta ~ dnorm(0,0.01)

# Likelihood
 for (i in 1:n) {
    C[i] ~ dbin(p[i], N)        # Note p before N
    logit(p[i]) <- alpha + beta *X1[i]
 }
}
",fill=TRUE)
sink()

# Bundle data
win.data <- list(mass = X1, n = length(X1))

# Inits function
inits <- function(){ list(alpha=rlnorm(1), beta=rlnorm(1))}

# Parameters to estimate
params <- c("alpha", "beta")

# MCMC settings
nc <- 3 #Number of Chains
ni <- 1200 #Number of draws from posterior
nb <- 200 #Number of draws to discard as burn-in
nt <- 2 Thinning rate

# Start Gibbs sampling
out <- bugs(data=win.data, inits=inits, parameters.to.save=params, 
model.file="model.txt", n.thin=nt, n.chains=nc, n.burnin=nb, 
n.iter=ni, debug = TRUE)

Solution

  • Your script is exactly the way to do it. It is almost working, it just required one simple change to make it work:

    win.data <- list(X1 = X1, n = length(X1), C = true.presence, N = 1)
    

    Which defines which data go to WinBugs. The variable C must be filled with true.presence, N must be 1 according to the data you generated - note that this is a special case of binomial distribution for N = 1, which is called Bernoulli - a simple "coin flip".

    Here is the output:

    > print(out, dig = 3)
    Inference for Bugs model at "model.txt", fit using WinBUGS,
     3 chains, each with 1200 iterations (first 200 discarded), n.thin = 2
     n.sims = 1500 iterations saved
                mean    sd    2.5%     25%     50%     75%   97.5%  Rhat n.eff
    alpha     -0.040 0.221  -0.465  -0.187  -0.037   0.114   0.390 1.001  1500
    beta       3.177 0.478   2.302   2.845   3.159   3.481   4.165 1.000  1500
    deviance 136.438 2.059 134.500 135.000 135.800 137.200 141.852 1.001  1500
    

    as you see, the parameters correspond to the parameters used to generate the data. Also, if you compare with the frequentist solution, you see it corresponds.

    EDIT: but the typical logistic (~ binomial) regression would measure some counts with some upper value N[i], and it would allow for different N[i] for each observation. For example say the proportion of juveniles to the whole population (N). This would require just to add index to N in your model:

    C[i] ~ dbin(p[i], N[i])
    

    The data generation would look something like:

    N = rpois(n = n.site, lambda = 50) 
    juveniles <- rbinom(n = n.site, size = N, prob = occ.prob)
    win.data <- list(X1 = X1, n = length(X1), C = juveniles, N = N)
    

    (end of edit)

    For more examples from population ecology see books of Marc Kéry (Introduction to WinBUGS for ecologist, and especially Bayesian Population Analysis using WinBUGS: A hierarchical perspective, which is a great book).

    The complete script I used - the corrected script of yours is listed here (comparison with frequentist solution at the end):

    #library(MASS)
    library(R2WinBUGS)
    
    #setwd("d:/BayesianLogisticRegression")
    
    n.site <- 150
    
    X1<- sort(runif(n = n.site, min = -1, max =1))
    
    xb <- 0.0 + 3.0*X1 
    
    occ.prob <- 1/(1+exp(-xb)) # inverse logit
    
    plot(X1, occ.prob,xlab="X1",ylab="occ.prob")
    
    true.presence <- rbinom(n = n.site, size = 1, prob = occ.prob)
    
    plot(X1, true.presence,xlab="X1",ylab="true.presence")
    
    # combine data as data frame and save
    data <- data.frame(X1, true.presence)
    write.matrix(data, file = "data.txt", sep = "\t")
    
    sink("tmp_bugs/model.txt")
    cat("
    model {
    
    # Priors
     alpha ~ dnorm(0,0.01)
     beta ~ dnorm(0,0.01)
    
    # Likelihood
     for (i in 1:n) {
        C[i] ~ dbin(p[i], N)        # Note p before N
        logit(p[i]) <- alpha + beta *X1[i]
     }
    }
    ",fill=TRUE)
    sink()
    
    # Bundle data
    win.data <- list(X1 = X1, n = length(X1), C = true.presence, N = 1)
    
    # Inits function
    inits <- function(){ list(alpha=rlnorm(1), beta=rlnorm(1))}
    
    # Parameters to estimate
    params <- c("alpha", "beta")
    
    # MCMC settings
    nc <- 3 #Number of Chains
    ni <- 1200 #Number of draws from posterior
    nb <- 200 #Number of draws to discard as burn-in
    nt <- 2 #Thinning rate
    
    # Start Gibbs sampling
    out <- bugs(data=win.data, inits=inits, parameters.to.save=params, 
    model.file="model.txt", n.thin=nt, n.chains=nc, n.burnin=nb, 
    n.iter=ni, 
    working.directory = paste(getwd(), "/tmp_bugs/", sep = ""),
    debug = TRUE)
    
    print(out, dig = 3)
    
    # Frequentist approach for comparison
    m1 = glm(true.presence ~ X1, family = binomial)
    summary(m1)
    
    # normally, you should do it this way, but the above also works:
    #m2 = glm(cbind(true.presence, 1 - true.presence) ~ X1, family = binomial)