Search code examples
rggplot2predictinteraction

Visualising a three way interaction between two continuous variables and one categorical variable in R


I have a model in R that includes a significant three-way interaction between two continuous independent variables IVContinuousA, IVContinuousB, IVCategorical and one categorical variable (with two levels: Control and Treatment). The dependent variable is continuous (DV).

model <- lm(DV ~ IVContinuousA * IVContinuousB * IVCategorical)

You can find the data here

I am trying to find out a way to visualise this in R to ease my interpretation of it (perhaps in ggplot2?).

Somewhat inspired by this blog post I thought that I could dichotomise IVContinuousB into high and low values (so it would be a two-level factor itself:

IVContinuousBHigh <- mean(IVContinuousB) + sd (IVContinuousB) 
IVContinuousBLow <- mean(IVContinuousB) - sd (IVContinuousB)

I then planned to plot the relationship between DV and IV ContinuousA and fit lines representing the slopes of this relationship for different combinations of IVCategorical and my new dichotomised IVContinuousB:

IVCategoricalControl and IVContinuousBHigh
IVCategoricalControl and IVContinuousBLow
IVCategoricalTreatment and IVContinuousBHigh
IVCategoricalTreatment and IVContinuousBLow

My first question is does this sound like a viable solution to producing an interpretable plot of this three-way-interaction? I want to avoid 3D plots if possible as I don't find them intuitive... Or is there another way to go about it? Maybe facet plots for the different combinations above?

If it is an ok solution, my second question is how to I generate the data to predict the fit lines to represent the different combinations above?

Third question - does anyone have any advice as to how to code this up in ggplot2?

I posted a very similar question on Cross Validated but because it is more code related I thought I would try here instead (I will remove the CV post if this one is more relevant to the community :) )

Thanks so much in advance,

Sarah

Note that there are NAs (left as blanks) in the DV column and the design is unbalanced - with slightly different numbers of datapoints in the Control vs Treatment groups of the variable IVCategorical.

FYI I have the code for visaualising a two-way interaction between IVContinuousA and IVCategorical:

A<-ggplot(data=data,aes(x=AOTAverage,y=SciconC,group=MisinfoCondition,shape=MisinfoCondition,col = MisinfoCondition,))+geom_point(size = 2)+geom_smooth(method='lm',formula=y~x)

But what I want is to plot this relationship conditional on IVContinuousB....


Solution

  • Here are a couple of options for visualizing the model output in two dimensions. I'm assuming here that the goal here is to compare Treatment to Control

    library(tidyverse)
      theme_set(theme_classic() +
              theme(panel.background=element_rect(colour="grey40", fill=NA))
    
    dat = read_excel("Some Data.xlsx")  # I downloaded your data file
    
    mod <- lm(DV ~ IVContinuousA * IVContinuousB * IVCategorical, data=dat)
    
    # Function to create prediction grid data frame
    make_pred_dat = function(data=dat, nA=20, nB=5) {
      nCat = length(unique(data$IVCategorical))
      d = with(data, 
               data.frame(IVContinuousA=rep(seq(min(IVContinuousA), max(IVContinuousA), length=nA), nB*2),
                          IVContinuousB=rep(rep(seq(min(IVContinuousB), max(IVContinuousB), length=nB), each=nA), nCat),
                          IVCategorical=rep(unique(IVCategorical), each=nA*nB)))
    
      d$DV = predict(mod, newdata=d)
    
      return(d)
    }
    

    IVContinuousA vs. DV by levels of IVContinuousB

    The roles of IVContinuousA and IVContinuousB can of course be switched here.

    ggplot(make_pred_dat(), aes(x=IVContinuousA, y=DV, colour=IVCategorical)) + 
      geom_line() +
      facet_grid(. ~ round(IVContinuousB,2)) +
      ggtitle("IVContinuousA vs. DV, by Level of IVContinousB") +
      labs(colour="")
    

    enter image description here

    You can make a similar plot without faceting, but it gets difficult to interpret as the number of IVContinuousB levels increases:

    ggplot(make_pred_dat(nB=3), 
           aes(x=IVContinuousA, y=DV, colour=IVCategorical, linetype=factor(round(IVContinuousB,2)))) + 
      geom_line() +
      #facet_grid(. ~ round(IVContinuousB,2)) +
      ggtitle("IVContinuousA vs. DV, by Level of IVContinousB") +
      labs(colour="", linetype="IVContinuousB") +
      scale_linetype_manual(values=c("1434","11","62")) +
      guides(linetype=guide_legend(reverse=TRUE))
    

    enter image description here

    Heat map of the model-predicted difference, DV treatment - DV control on a grid of IVContinuousA and IVContinuousB values

    Below, we look at the difference between treatment and control at each pair of IVContinuousA and IVContinuousB.

    ggplot(make_pred_dat(nA=100, nB=100) %>% 
             group_by(IVContinuousA, IVContinuousB) %>% 
             arrange(IVCategorical) %>% 
             summarise(DV = diff(DV)), 
           aes(x=IVContinuousA, y=IVContinuousB)) + 
      geom_tile(aes(fill=DV)) +
      scale_fill_gradient2(low="red", mid="white", high="blue") +
      labs(fill=expression(Delta*DV~(Treatment - Control)))
    

    enter image description here