Search code examples
rggplot2regressiondata-visualizationancova

Regression lines in ggplot


I have a 2 x 2 with the resulting plot showing 4 regression lines and 4 groups in various colours on the plot. I wish to retain the 4 colours in the plot but only show 2 lines for one of the variables - not all 4 as shown. The data is here-

PLD.df <- structure(list(Site = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Inshore", "OffReef"
), class = "factor"), Depth = structure(c(1L, 1L, 1L, 1L, 1L, 
1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L), .Label = c("Deep", "Shallow"
), class = "factor"), PLD = c(37L, 38L, 47L, 51L, 51L, 53L, 34L, 
39L, 40L, 45L, 49L, 49L, 26L, 29L, 35L, 35L, 36L, 36L, 37L, 38L, 
38L, 40L, 41L, 46L, 47L, 52L, 37L, 38L, 40L, 45L, 45L), Location = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L), .Label = c("ID", 
"IS", "OD", "OS"), class = "factor"), b = c(0.052, 0.05, 0.039, 
0.043, 0.036, 0.033, 0.055, 0.051, 0.048, 0.046, 0.041, 0.04, 
0.05, 0.05, 0.051, 0.049, 0.056, 0.052, 0.047, 0.045, 0.047, 
0.045, 0.045, 0.045, 0.039, 0.038, 0.046, 0.049, 0.046, 0.044, 
0.041)), .Names = c("Site", "Depth", "PLD", "Location", "b"), class = "data.frame", row.names = c(NA, 
-31L))

The plot is below-

ANCOVA Plot:

enter image description here

and the code i used to create it is here-

ggplot(PLD.df, aes(x=PLD, y=b, colour=Location)) + 
  geom_point(aes(shape=Location),size=3) + 
  scale_shape(solid=FALSE) + 
  scale_colour_manual(values=cb_palette) + 
  geom_smooth(aes(linetype=Location),method=lm, se=FALSE, fullrange=F) + 
  theme(panel.border=element_rect(colour="black", fill=NA,size=3),
        panel.background=element_rect(fill=FALSE),
        panel.grid.major=element_blank(),
        panel.grid.minor=element_blank()) + 
  theme(legend.position="NONE")

Would the easiest way to be to remove the lines all together and then use the predictvals() function to redraw the required lines? I would like only show the regression lines for the "Inshore" and "Offreef" locations while retaining the colours for all 4 sites.

Note: This is my first question here so apologies if my question format is not correct or I haven't included all of the necessary information. Thanks!


Solution

  • If I get you correct, you only specify the x and y inside the ggplot(aes(..)) call, then inside geom_smooth, you group according to the site (instead of Location?). This will give you a prediction within the Site:

    cb_palette <- c("#999999", "#E69F00", "#56B4E9", "#009E73", 
                   "#F0E442", "#0072B2", "#D55E00", "#CC79A7")
    
    ggplot(PLD.df, aes(x=PLD, y=b)) + 
      geom_point(aes(shape=Location,colour=Location),size=3) + 
      scale_shape(solid=FALSE) + 
      scale_colour_manual(values=cb_palette) + 
      geom_smooth(aes(linetype=Site),
                  method=lm, se=FALSE, fullrange=F,col="gray") + 
      theme(panel.border=element_rect(colour="black", fill=NA,size=3),
            panel.background=element_rect(fill=FALSE),
            panel.grid.major=element_blank(),
            panel.grid.minor=element_blank()) + 
      theme(legend.position="NONE")
    

    enter image description here