Search code examples
rggpairs

Using ggpairs to get correlation values


I used ggpairs to get the following plots:

enter image description here

I want to know which pair has the largest absolute correlation. However, how am I supposed to ascertain this when some of the pairs, such as Power-Span, Power-Length, Span-Length, etc., are covered by plots? Also, is there an easier way to view the correlations (in text format), rather than having to view them through the image?


Solution

  • The correlation coefficient between Power and Span is the same as the one between Span and Power. The correlation coefficient is calculated based on the sum of the squared differences between the points and the best fit line, so it does not matter which series is on which axis. So you can read the correlation coefficients in the upper right and see the scatter plots in the lower left.

    The cor function returns the correlation coefficient between two vectors. The order does not matter.

    set.seed(123)
    x <- runif(100)
    y <- rnorm(100)
    cor(x, y)
    [1] 0.05564807
    cor(y, x)
    [1] 0.05564807
    

    If you feed a data.frame (or similar) to cor(), you will get a correlation matrix of the correlation coefficients between each pair of variables.

    set.seed(123)
    df <- data.frame(x= rnorm(100), 
                     y= runif(100), 
                     z= rpois(100, 1), 
                     w= rbinom(100, 1, .5))
    cor(df)
                x           y          z           w
    x  1.00000000  0.05564807 0.13071975 -0.14978770
    y  0.05564807  1.00000000 0.09039201 -0.09250531
    z  0.13071975  0.09039201 1.00000000  0.11929637
    w -0.14978770 -0.09250531 0.11929637  1.00000000
    

    You can see in this matrix the symmetry around the diagonal.

    If you want to programmatically identify the largest (non-unity) correlation coefficient, you can do the following:

    library(dplyr)
    library(tidyr)
    cor(df) %>%
      as_data_frame(rownames = "var1") %>%
        pivot_longer(cols = -var1, names_to = "var2", values_to = "coeff") %>%
        filter(var1 != var2) %>%
        arrange(desc(abs(coeff)))
    # A tibble: 12 x 3
      var1  var2    coeff
      <chr> <chr>   <dbl>
     1 x     w     -0.150 
     2 w     x     -0.150 
     3 x     z      0.131 
     4 z     x      0.131 
     5 z     w      0.119 
     6 w     z      0.119 
     7 y     w     -0.0925
     8 w     y     -0.0925
     9 y     z      0.0904
    10 z     y      0.0904
    11 x     y      0.0556
    12 y     x      0.0556