Search code examples
rcorrelationanova

Automatisation of correlations testing of linear model residuals


Is there a method/code available to automate the correlation testing procedure i am using? The automatisation code should enable the comparison of all columns within a dataset and also generate a correlation matrix as output. For now, i am manually changing the column names within the code and writing the result on an Excel sheet. However, as there are more than 60 observation columns this is laborious, hence the question.

Detailed reasons for this approach to correlation analysis can be found here.

I prepared a code example using an IRIS dataset. The IRIS dataset originally doesn't contain replications, however, 50 replications per observation and species are available. The code is the same as I'm using for the analysis of my data set.

mm1 <- lm(Sepal.Length ~ factor(Replication) + Species, data=iris)
res1 <- mm1$residuals
mm2 <- lm(Sepal.Width ~ factor(Replication) + Species, data=iris)
res2 <- mm2$residuals
cor.test(res1,res2)

Solution

  • I'm ignoring your Replication because you didn't provide data containing it. However, the following should be easy to adjust.

    DF <- iris
    
    library(data.table)
    setDT(DF)
    DFlong <- melt(DF, id.vars = "Species")
    
    library(nlme)
    mm <- lmList(value ~ Species | variable, data = DFlong, na.action = na.fail)
    
    res <- matrix(residuals(mm), nrow = nrow(DF), dimnames = list(NULL, rownames(coef(mm))))
    
    library(psych)
    cors <- corr.test(res)
    
    print(cors, digits = 2, short = FALSE)
    # Call:corr.test(x = res)
    # Correlation matrix 
    #              Sepal.Length Sepal.Width Petal.Length Petal.Width
    # Sepal.Length         1.00        0.53         0.76        0.36
    # Sepal.Width          0.53        1.00         0.38        0.47
    # Petal.Length         0.76        0.38         1.00        0.48
    # Petal.Width          0.36        0.47         0.48        1.00
    # Sample Size 
    # [1] 150
    # Probability values (Entries above the diagonal are adjusted for multiple tests.) 
    #              Sepal.Length Sepal.Width Petal.Length Petal.Width
    # Sepal.Length            0           0            0           0
    # Sepal.Width             0           0            0           0
    # Petal.Length            0           0            0           0
    # Petal.Width             0           0            0           0
    # 
    #  Confidence intervals based upon normal theory.  To get bootstrapped values, try cor.ci
    #             raw.lower raw.r raw.upper raw.p lower.adj upper.adj
    # Spl.L-Spl.W      0.40  0.53      0.64     0      0.36      0.67
    # Spl.L-Ptl.L      0.68  0.76      0.82     0      0.65      0.84
    # Spl.L-Ptl.W      0.22  0.36      0.50     0      0.22      0.50
    # Spl.W-Ptl.L      0.23  0.38      0.51     0      0.21      0.52
    # Spl.W-Ptl.W      0.34  0.47      0.59     0      0.30      0.61
    # Ptl.L-Ptl.W      0.35  0.48      0.60     0      0.31      0.63
    
    cors$r
    #             Sepal.Length Sepal.Width Petal.Length Petal.Width
    #Sepal.Length    1.0000000   0.5302358    0.7561642   0.3645064
    #Sepal.Width     0.5302358   1.0000000    0.3779162   0.4705346
    #Petal.Length    0.7561642   0.3779162    1.0000000   0.4844589
    #Petal.Width     0.3645064   0.4705346    0.4844589   1.0000000
    
    cors$p
    #             Sepal.Length  Sepal.Width Petal.Length  Petal.Width
    #Sepal.Length 0.000000e+00 1.495184e-11 2.860911e-28 4.524678e-06
    #Sepal.Width  2.990368e-12 0.000000e+00 3.724687e-06 3.699152e-09
    #Petal.Length 4.768185e-29 1.862343e-06 0.000000e+00 1.340992e-09
    #Petal.Width  4.524678e-06 1.233051e-09 3.352479e-10 0.000000e+00
    
    cors$p.adj
    #[1] 1.495184e-11 2.860911e-28 3.724687e-06 4.524678e-06 3.699152e-09 1.340992e-09
    

    You should use the adjusted p-values (i.e., the upper triangle of the cors$p matrix) if you do more than one comparison.