Search code examples
pythonnumpyvectorscipy

How do I convert a 2D column vector to a 1D vector in python for a pearsonr calculation in python?


I am fairly new to working with python and am struggling a bit on this.

In this code I have defined xhand to be collected data of a hand moving horizontally and xpred to be a prediction of the hand movements based off of neural firing collected at the same time xhand was collected. So now I am trying to calculate the Pearson r value to determine if they are correlated or not.

these are the libraries i have installed:

import statistics as stats
import numpy as np
import matplotlib.pyplot as plt
import math
import scipy.stats
from scipy.stats import pearsonr

%matplotlib inline

To calculate the pearson r for my data this is what I tried:

correlation_coefficient, p_value = pearsonr(xhand, xpred)

now when I ran this I got this error:

ValueError                                Traceback (most recent call last)
<ipython-input-101-b9cb969c9e2c> in <cell line: 17>()
     15 
     16 # double check
---> 17 correlation_coefficient, p_value = pearsonr(xhand, xpred)
     18 print("Sum of squares correlation value:",r)
     19 print("Built in function correlation value:",correlation_coefficient)

1 frames
/usr/local/lib/python3.10/dist-packages/numpy/core/overrides.py in dot(*args, **kwargs)

ValueError: shapes (1000,1) and (1000,1) not aligned: 1 (dim 1) != 1000 (dim 0)

so then I figured that it needs to be flattened so I ran this:

correlation_coefficient, p_value = pearsonr(xhand.flatten(), xpred.flatten())

and subsequently got this error:

ValueError                                Traceback (most recent call last)
<ipython-input-103-5eec4d420c4b> in <cell line: 17>()
     15 
     16 # double check
---> 17 correlation_coefficient, p_value = pearsonr(xhand.flatten(), xpred.flatten())
     18 print("Sum of squares correlation value:",r)
     19 print("Built in function correlation value:",correlation_coefficient)

/usr/local/lib/python3.10/dist-packages/scipy/stats/_stats_py.py in pearsonr(x, y, alternative, method)
   4766 
   4767     if n < 2:
-> 4768         raise ValueError('x and y must have length at least 2.')
   4769 
   4770     x = np.asarray(x)

ValueError: x and y must have length at least 2.

I tried using xhand.reshape(-1) and xpred.reshape(-1) and .ravel() as well but I keep getting the same error.

How can I fix this, I don't know of any other ways convert them from 2D to 1D?


Solution

  • What you're trying is correct if xhand and xpred are NumPy arrays.

    import numpy as np
    from scipy import stats
    rng = np.random.default_rng(483465834568457)
    xhand = rng.random(size=(1000, 1))
    xpred = rng.random(size=(1000, 1))
    
    stats.pearsonr(xhand, xpred)
    # ValueError: shapes (1000,1) and (1000,1) not aligned: 1 (dim 1) != 1000 (dim 0)
    
    stats.pearsonr(xhand.ravel(), xpred.ravel())
    # PearsonRResult(statistic=-0.02435495881216112, pvalue=0.44170202221356253)
    
    stats.pearsonr(xhand.reshape(-1), xpred.reshape(-1))
    # PearsonRResult(statistic=-0.02435495881216112, pvalue=0.44170202221356253)
    
    stats.pearsonr(xhand.flatten(), xpred.flatten())
    # PearsonRResult(statistic=-0.02435495881216112, pvalue=0.44170202221356253)
    

    As a comment mentions, if xhand and xpred are not arrays, this might not work. So consider explicitly converting them to arrays and verifying that the shape is correct before continuing.

    xhand = np.matrix(rng.random(size=(1000, 1)))
    xpred = np.matrix(rng.random(size=(1000, 1)))
    stats.pearsonr(xhand.reshape(-1), xpred.reshape(-1))
    # ValueError: x and y must have length at least 2.
    
    xhand = np.asarray(xhand).ravel()
    xhand.shape # (1000,)
    xpred = np.asarray(xpred).ravel()
    xpred.shape # (1000,)
    stats.pearsonr(xhand, xpred)