Search code examples
rggplot2plotlattice

Asking for ideas to plot large variables against their R squared values (drawn from linear regression models) in R (preferably using GGPLOT2)


I had to build 231 linear regression models for my project. After running 231 models, I am left with 231 R squared values that I have to present in a plot against the variable names. Since 231 R squared values are too many for a table, I am looking for a plotting ideas so I can show R squared values as y-axis and variable names as x-axis. When I run dput(head(df, 5)) I get this (which may give you an idea of my data):

structure(list(Band = c(402, 411, 419, 427, 434), R.squared = c(0.044655015122032, 
0.852028718800355, 0.818617476505653, 0.825782272278991, 0.860844967662728
), Adj.Rsquared = c(-0.0614944276421867, 0.835587465333728, 0.798463862784058, 
0.806424746976656, 0.845383297403031), Intercept = c(0.000142126282140086, 
-0.00373545760470339, -0.00258909036368109, 0.000626075834918527, 
-3.3448513588372e-05), Slope = c(-0.00108714482110104, 0.393380133190131, 
0.443463459485279, 0.503881831479685, 0.480162723468755)), row.names = c(NA, 
5L), class = "data.frame")

Please note that my full data have 231 observations and I want to plot the variable band (as a factor) as an x-axis and R squared as a y-axis. I already tried geom_point() in ggplot2 but it looks very messy and complicated to understand. Any ideas?

Update: when I use the suggested code by @Duck I get this plot which is a little messy to use for a scientific presentation.plot


Solution

  • If you have a large number of values you can dodge the labels in axis, here an example:

    library(ggplot2)
    #Code
    ggplot(mdf,aes(x=factor(Band),y=R.squared))+
      geom_point()+
      scale_x_discrete(guide = guide_axis(n.dodge=2))+
      coord_flip()
    

    Output:

    enter image description here

    Some data used:

    #Data
    mdf <- structure(list(Band = c(402, 411, 419, 427, 434, 412, 421, 429, 
    437, 444, 422, 431, 439, 447, 454, 432, 441, 449, 457, 464), 
        R.squared = c(0.044655015122032, 0.852028718800355, 0.818617476505653, 
        0.825782272278991, 0.860844967662728, 0.044655015122032, 
        0.852028718800355, 0.818617476505653, 0.825782272278991, 
        0.860844967662728, 0.044655015122032, 0.852028718800355, 
        0.818617476505653, 0.825782272278991, 0.860844967662728, 
        0.044655015122032, 0.852028718800355, 0.818617476505653, 
        0.825782272278991, 0.860844967662728), Adj.Rsquared = c(-0.0614944276421867, 
        0.835587465333728, 0.798463862784058, 0.806424746976656, 
        0.845383297403031, -0.0614944276421867, 0.835587465333728, 
        0.798463862784058, 0.806424746976656, 0.845383297403031, 
        -0.0614944276421867, 0.835587465333728, 0.798463862784058, 
        0.806424746976656, 0.845383297403031, -0.0614944276421867, 
        0.835587465333728, 0.798463862784058, 0.806424746976656, 
        0.845383297403031), Intercept = c(0.000142126282140086, -0.00373545760470339, 
        -0.00258909036368109, 0.000626075834918527, -3.3448513588372e-05, 
        0.000142126282140086, -0.00373545760470339, -0.00258909036368109, 
        0.000626075834918527, -3.3448513588372e-05, 0.000142126282140086, 
        -0.00373545760470339, -0.00258909036368109, 0.000626075834918527, 
        -3.3448513588372e-05, 0.000142126282140086, -0.00373545760470339, 
        -0.00258909036368109, 0.000626075834918527, -3.3448513588372e-05
        ), Slope = c(-0.00108714482110104, 0.393380133190131, 0.443463459485279, 
        0.503881831479685, 0.480162723468755, -0.00108714482110104, 
        0.393380133190131, 0.443463459485279, 0.503881831479685, 
        0.480162723468755, -0.00108714482110104, 0.393380133190131, 
        0.443463459485279, 0.503881831479685, 0.480162723468755, 
        -0.00108714482110104, 0.393380133190131, 0.443463459485279, 
        0.503881831479685, 0.480162723468755)), row.names = c(NA, 
    -20L), class = "data.frame")
    

    The suggestion from @DaveArmstrong is very helpful too (Many thanks and credits to him):

    #Code 2
    ggplot(mdf,aes(x=reorder(factor(Band), R.squared, mean),y=R.squared))+
      geom_point()+
      scale_x_discrete(guide = guide_axis(n.dodge=2))+
      coord_flip()
    

    Output:

    enter image description here

    Another option:

    #Code 3
    ggplot(mdf,aes(x=reorder(factor(Band), R.squared, mean),y=R.squared))+
      geom_point()+
      geom_segment( aes(x=reorder(factor(Band), R.squared, mean),
                        xend=reorder(factor(Band), R.squared, mean),
                        y=0,
                        yend=R.squared))+
      scale_x_discrete(guide = guide_axis(n.dodge=2))+
      coord_flip()
    

    Output:

    enter image description here