Search code examples
rpredictionglminteractionpredict

plotting glm interactions: "newdata=" structure in predict() function


My problem is with the predict() function, its structure, and plotting the predictions.

Using the predictions coming from my model, I would like to visualize how my significant factors (and their interaction) affect the probability of my response variable.

My model:

m1  <-glm ( mating ~  behv * pop + 
                      I(behv^2) * pop + condition, 
                      data=data1, family=binomial(logit)) 

mating: individual has mated or not (factor, binomial: 0,1)

pop: population (factor, 4 levels)

behv: behaviour (numeric, scaled & centered)

condition: relative fat content (numeric, scaled & centered)

Significant effects after running the glm:

pop1

condition

behv*pop2

behv^2*pop1

Although I have read the help pages, previous answers to similar questions, tutorials etc., I couldn't figure out how to structure the newdata= part in the predict() function. The effects I want to visualise (given above) might give a clue of what I want: For the "behv*pop2" interaction, for example, I would like to get a graph that shows how the behaviour of individuals from population-2 can influence whether they will mate or not (probability from 0 to 1).


Solution

  • Really the only thing that predict expects is that the names of the columns in newdata exactly match the column names used in the formula. And you must have values for each of your predictors. Here's some sample data.

    #sample data
    set.seed(16)
    data <- data.frame(
        mating=sample(0:1, 200, replace=T),
        pop=sample(letters[1:4], 200, replace=T),
        behv = scale(rpois(200,10)),
        condition = scale(rnorm(200,5))
    )
    
    data1<-data[1:150,]     #for model fitting
    data2<-data[51:200,-1]  #for predicting
    

    Then this will fit the model using data1 and predict into data2

    model<-glm ( mating ~  behv * pop + 
        I(behv^2) * pop + condition, 
        data=data1, 
        family=binomial(logit)) 
    
    predict(model, newdata=data2, type="response")
    

    Using type="response" will give you the predicted probabilities.

    Now to make predictions, you don't have to use a subset from the exact same data.frame. You can create a new one to investigate a particular range of values (just make sure the column names match up. So in order to explore behv*pop2 (or behv*popb in my sample data), I might create a data.frame like this

    popbbehv<-data.frame(
        pop="b",
        behv=seq(from=min(data$behv), to=max(data$behv), length.out=100),
        condition = mean(data$condition)
    )
    

    Here I fix pop="b" so i'm only looking at the pop, and since I have to supply condition as well, i fix that at the mean of the original data. (I could have just put in 0 since the data is centered and scaled.) Now I specify a range of behv values i'm interested in. Here i just took the range of the original data and split it into 100 regions. This will give me enough points to plot. So again i use predict to get

    popbbehvpred<-predict(model, newdata=popbbehv, type="response")
    

    and then I can plot that with

    plot(popbbehvpred~behv, popbbehv, type="l")
    

    results plot

    Although nothing is significant in my fake data, we can see that higher behavior values seem to result in less mating for population B.