Search code examples
rggplot2ggpubrggpmisc

How to add shapes for another factor in ggplot for regression model


I am trying to add shape to a regression model. Here is the example:

library(ggpubr)
data(iris)
iris$ran <- as.factor(rep(c(1:2), each = 75))
fit <- lm(Sepal.Length ~ Petal.Width+Species+ran, data = iris)
ggplot(fit$model, aes_string(x = names(fit$model)[2], y = names(fit$model)[1], 
color=names(fit$model)[3], shape=names(fit$model)[4])) +
geom_point() +
geom_smooth(aes_string(fill = names(fit$model)[3], color = names(fit$model)[3]), 
method = "lm", col= "red", fullrange = TRUE) +
labs(x=expression(paste("Petal Width")),
     y=expression(paste("Sepal Length")),
     caption = paste("R2 =",signif(summary(fit)$r.squared, 2),
                     "\tAdj R2 =",signif(summary(fit)$adj.r.squared, 2),
                     "\tIntercept =",signif(fit$coef[[1]],2 ),
                     "\tSlope =",signif(fit$coef[[2]], 2),
                     "\tP =",signif(summary(fit)$coef[2,4], 2)))+
theme_classic2(base_size = 14)

I am getting a plot with four linear lines for each of the factor. I rather want linear regression lines only for "Species" but different shapes for "ran"(without adding regression lines for "ran" to the plot).

Also, I am also intending to change "R2" to R^2 which I am unable to do using current script and change the legend for ran as "Random" - "Factor1" and "Factor2".

Thank you in advance for your help.


Solution

  • This alternative answer is simpler, I think. It is possible to use fullrange = TRUE and se = FALSE and not to color the points also with this approach, but this yields a plot that badly misrepresents the data. Even if this does not produce the same caption, the code in my answer shows the results of each of the three fits automatically, and it would work unchanged with a different number of factor levels.

    The iris data are being used as an example here, so that both widths and lengths are random variables can be ignored and OLS used. Otherwise major axis regression would be preferable, and the code below could be rewritten using stat_ma_line() and stat_ma_eq() and slightly adjusting the arguments passed to them.

    library(ggpmisc)
    #> Loading required package: ggpp
    #> Loading required package: ggplot2
    #> 
    #> Attaching package: 'ggpp'
    #> The following object is masked from 'package:ggplot2':
    #> 
    #>     annotate
    iris$ran <- factor(rep(c(1:2), each = 75), labels = paste("Factor", 1:2))
    
    ggplot(iris, aes(Petal.Width, Sepal.Length, colour = Species)) +
      geom_point(aes(shape = ran)) +
      stat_poly_line() + # se = FALSE can be added
      stat_poly_eq(aes(label = paste(after_stat(rr.label),
    #                                 after_stat(adj.rr.label),
                                     after_stat(eq.label), 
                                     after_stat(p.value.label),
    #                                 after_stat(n.label),
                                     sep = "*\", \"*"))) +
      labs(x = "Petal Width", y = "Sepal length", shape = "Random") +
      theme_classic(base_size = 14)
    

    Created on 2021-09-10 by the reprex package (v2.0.1)