I used ggpairs
to get the following plots:
I want to know which pair has the largest absolute correlation. However, how am I supposed to ascertain this when some of the pairs, such as Power-Span, Power-Length, Span-Length, etc., are covered by plots? Also, is there an easier way to view the correlations (in text format), rather than having to view them through the image?
The correlation coefficient between Power
and Span
is the same as the one between Span
and Power
. The correlation coefficient is calculated based on the sum of the squared differences between the points and the best fit line, so it does not matter which series is on which axis. So you can read the correlation coefficients in the upper right and see the scatter plots in the lower left.
The cor
function returns the correlation coefficient between two vectors. The order does not matter.
set.seed(123)
x <- runif(100)
y <- rnorm(100)
cor(x, y)
[1] 0.05564807
cor(y, x)
[1] 0.05564807
If you feed a data.frame (or similar) to cor()
, you will get a correlation matrix of the correlation coefficients between each pair of variables.
set.seed(123)
df <- data.frame(x= rnorm(100),
y= runif(100),
z= rpois(100, 1),
w= rbinom(100, 1, .5))
cor(df)
x y z w
x 1.00000000 0.05564807 0.13071975 -0.14978770
y 0.05564807 1.00000000 0.09039201 -0.09250531
z 0.13071975 0.09039201 1.00000000 0.11929637
w -0.14978770 -0.09250531 0.11929637 1.00000000
You can see in this matrix the symmetry around the diagonal.
If you want to programmatically identify the largest (non-unity) correlation coefficient, you can do the following:
library(dplyr)
library(tidyr)
cor(df) %>%
as_data_frame(rownames = "var1") %>%
pivot_longer(cols = -var1, names_to = "var2", values_to = "coeff") %>%
filter(var1 != var2) %>%
arrange(desc(abs(coeff)))
# A tibble: 12 x 3
var1 var2 coeff
<chr> <chr> <dbl>
1 x w -0.150
2 w x -0.150
3 x z 0.131
4 z x 0.131
5 z w 0.119
6 w z 0.119
7 y w -0.0925
8 w y -0.0925
9 y z 0.0904
10 z y 0.0904
11 x y 0.0556
12 y x 0.0556