Search code examples
pythondatasetcorrelation

How can I find a correlation coefficient between 2 datasets?


I am trying to find a way to get a correlation coefficient for two datasets. Such that I have dataset A, a table holding (x,y) values thats pertain to a part of the bodies movements over a set of frames. Dataset B is a different table of (x,y) values for a different body part. I want to compare the two tables of values to get the correlation coefficient,

I have tried to compare the x1,x2 as (x,y) for the correlation coefficient but this is not a valid way of finding such correlation and feel there has to be some program, application, or calculation that I can use to find this correlation coefficient.

I have 4 columns in excel, 2 x-coordinate columns and 2 y-coordinate columns


Solution

  • You can use the Pearson correlation coefficient provided by numpy library. The correlation coefficient ranges between -1 (perfect negative correlation) and 1 (perfect positive correlation), and 0 means no correlation. Assuming each dataset is a 2D array where rows represent (x, y) pairs and columns represent samples:

    x_corr = np.corrcoef(A[:, 0], B[:, 0])[0, 1]  # corr coef for x
    y_corr = np.corrcoef(A[:, 1], B[:, 1])[0, 1]  # corr coef for y
    corr = np.corrcoef(A.flatten(), B.flatten())[0, 1]  # corr coef for both x and y