Search code examples
rcorrelationr-corrplot

correlation of log and sqrt of the same matrix


I have a dataframe subs containing 9 variables I want to get the correlation between the log and the square root of the variables.

So I first calculate the 2 matrices:

library(corrplot)

logs = log(subs)
sqroots = sqrt(subs)

Then I replace infinite values if any:

logs = do.call(data.frame,lapply(logs, function(x) replace(x, 
is.infinite(x),NA)))

sqroots = do.call(data.frame,lapply(sqroots, function(x) replace(x, 
is.infinite(x),NA)))

Then I use corrplot to plot the correlation matrix:

corrplot(cor(logs,sqroots, use = "complete.obs"), order = "AOE")

But it gives the error:

Error in e1 > 0 : invalid comparison with complex values

What am I doing wrong here? Thanks in advance for any help!

subs: https://pastebin.com/raw/y35FG2ZV


Solution

  • I believe the problem is how you are trying to get rid of infinite values. The following ought to solve your problem:

    library(corrplot)
    
    ### Defining subs & onlyNum for the code to be reproducible###
    subs <- matrix(nrow = 5, ncol = 5)
    for(k in 1:5){subs[,k] <- sample(1:100, 5);rm(k)}
    onlyNum <- 1:5
    
    ### To the answer ###
    logs = log(subs[,onlyNum])
    sqroots = sqrt(subs[,onlyNum])
    
    ### The 2 lines below should solve your issue ###
    logs[which(is.infinite(logs), arr.ind = TRUE)] <- NA
    sqroots[which(is.infinite(sqroots), arr.ind = TRUE)] <- NA
    
    corrplot(cor(logs,sqroots, use = "complete.obs"), order = "AOE")
    

    Here is the output yielded:


    Update

    Now that that data has been provided, an edit is in order.

    (1) As subs is a data frame and is.infinite has no implemented method for data frames, one would have to use sapply(logs, is.infinite) instead of is.infinite(logs) and respectively for sqroots.

    (2) However, as indicated, the issue lies in order = "AOE": "AOE" (Angular Order of the Eigenvectors) is only defined for real valued eigenvalues as it needs to check positiveness (c.f. ?corrMatOrder). Computing the eigenvalues of the correlation matrix yields:

    > eigen(cor(logs,sqroots, use = "complete.obs"))$values
    [1] 2.35892882+0.0000000i 1.69884142+0.0000000i 1.16180544+0.0000000i
    [4] 0.99435961+0.0176823i 0.99435961-0.0176823i 0.89281529+0.0000000i
    [7] 0.32520739+0.0000000i 0.29605683+0.0000000i 0.05592473+0.0000000i
    

    Therefore another Argument for order has to be chosen, e.g. order = "FPC" (First Principal Component).