Search code examples
rcorrelationpsych

pairs.panels function in R for specific correlations of columns in dataframe


I have a very large data set and am attempting to find the correlation between a lot of different (and random) combinations of the data. For instance, I may want the correlation between the 3rd column with the 12th-15th columns, or the correlation between the 20th column with the 1st-4th columns, etc...

I am currently using pairs.panels() function from the psych library, but am unable to pinpoint the specific pairing of columns I would like.


Solution

  • Here is df, a dummy data.frame with 26 columns, each containing random values, so the correlations should be reasonably low for any pair of columns.

    cols = lapply(1:26, function(dummy) runif(30))
    df = do.call(data.frame, cols)
    names(df) = LETTERS
    

    If you want the correlation between column "X" and columns "A", "C", and "E", try sapply with the cor function.

    sapply(df[c("A","C","E")], cor, df["X"])
    

    Or use column numbers:

    sapply(df[c(1,3,5)], cor, df[24])
    

    If you want all the permutative combinations of correlations between two groups of columns, try:

    firstGroup <- c(1,3,5,20)
    secondGroup <- c(14,20,25)
    combos <- expand.grid(firstGroup, secondGroup)
    result <- mapply(cor, df[combos$Var1], df[combos$Var2])
    resultAsMatrix <- matrix(result, nrow = length(firstGroup), dimnames = list(firstGroup, secondGroup))
    

    To get:

    > resultAsMatrix
                14         20          25
    1  -0.22949844 -0.1527876 -0.11877405
    3   0.23174965  0.0311125  0.33570756
    5   0.01491815 -0.1263007 -0.16688800
    20  0.18007802  1.0000000  0.04638838
    

    EDIT:

    @user20650 pointed out that the cor function has the capacity to compare two matrices built in. So:

    cor(df[firstGroup], df[secondGroup])
    

    yields the matrix I created manually, above:

                N          T           Y
    A -0.22949844 -0.1527876 -0.11877405
    C  0.23174965  0.0311125  0.33570756
    E  0.01491815 -0.1263007 -0.16688800
    T  0.18007802  1.0000000  0.04638838