Search code examples
rsortingdataframepearson-correlation

Find the pair of most correlated variables


Suppose I have a data frame consisting of 20 columns (variables) and all of them are numeric. I can always use the cor function in R to get the correlation coefficients in matrix form or actually visualize the correlation matrix (with correlation coefficients labeled as well). Suppose I just want to sort the pairs according to the correlation coefficients value, how to do this in R ?


Solution

  • Solution using corrr:

    corrr is a package for exploring correlations in R. It focuses on creating and working with data frames of correlations

    library(corrr)
    matrix(rnorm(100), 5) %>%
        correlate() %>% 
        stretch() %>% 
        arrange(r)
    

    Solution using reshape2 & data.table:

    You can reshape2::melt (imported with data.table) cor result and order (sort) according correlation values.

    library(data.table)
    corMatrix <- cor(matrix(rnorm(100), 5))
    setDT(melt(corMatrix))[order(value)]