Search code examples
pythonmacosmatrixcovariancevariance

How to calculate covariance on 2 columns out of multiple columns in python?


I've provided a sample data below. It contains 8x10 matrix which contains two-dimensional normal distribution. For ex, col1 and col2 is 1 set, col3/col4 is another and so on. I'm trying to calculate covariance of the individual set in python. So far, I've been unsuccessful and i'm new to python. However, below is what I've tried:

import pandas
import numpy
import matplotlib.pyplot as plg    
data = pandas.read_excel("testfile.xlsx", header=None)
dataNpy = pandas.DataFrame.to_numpy(data)

mean = numpy.mean(dataNpy, axis=0)
dataAWithoutMean = dataNpy - mean
covB = numpy.cov(dataAWithoutMean)

print("cov is: " + str(covB))

I've been tasked to calculate 4 separate covariance matrices and plot the covariance value for each set. In addition, plot the variance of each set.

dataset:

5.583566716 -0.441667252 -0.663300181 -1.249623134 -6.530464227 -4.984165997 2.594874802    2.646629654
6.129721509 2.374902708 -2.583949571 -2.224729817 0.279965502 -0.850298098 -1.542499771 -2.686894831
5.793226266 1.133844629 -1.939493549 1.570726544 -2.125423302 -1.33966397 -0.42901856   -0.09814741
3.413049714 -0.1133744  -0.032092831 -0.122147373 2.063549449 0.685517481 5.887909556   4.056242954
-2.639701885 -0.716557389 -0.851273969 -0.522784614 -7.347432606 -2.653482175 1.043389849   0.774192416
-1.84827484 -0.636893709 -2.223488277 -1.227420764 0.253999505 0.540299783 -1.593071594 -0.70980532
0.754029441 1.427571018 5.486147486 2.956320758 2.054346142 1.939929175 -3.559875405    -3.074861749
2.009806308 1.916796155 7.820990369 2.953681659 2.071682641 0.105056782 -1.120995825    -0.036335483
1.875128481 1.785216268 -2.607698929 0.244415372 -0.793431956 -1.598343481 -2.120852679 -2.777871862
0.168442246 0.324606905 0.53741174  0.274617158 -2.99037756 -3.323958514 -3.288399345   -2.482277047

Thanks for helping in advance :)


Solution

  • Is this what you need?

    import pandas
    import numpy
    import matplotlib.pyplot as plt
    
    data = pandas.read_excel("Book1.xlsx", header=None)
    
    mean = data.mean(axis=0)
    dataAWithoutMean = data - mean
    
    # Variance of each set
    dataAWithoutMean.var()
    
    # Covariance matrix
    cov = dataAWithoutMean.cov()
    plt.matshow(cov)
    plt.show()