Search code examples
pythonpandaspearson-correlation

iterating over dataframe for a pearsonr test


Trying to loop through a dataframe starting at the second column to conduct a pearsonr test on the returns. The dataset is just nvidia from yahoo finance

df=pd.read_csv('NVDA.csv',dtype={'label':str})

for column in df.loc[:,0:3]:
     pearson_coefficient,p_value=pearsonr(column,df['Volume'])
print('Pearson Coefficient: ',pearson_coefficient)

Solution

  • Consider this mini-example:

    In [10]: df = pd.DataFrame(np.random.randint(10, size=(6,4)))
    
    In [11]: [col for col in df.loc[:, 0:3]]
    Out[11]: [0, 1, 2, 3]
    

    Notice that loops of the form for col in df iterate over the column labels, not the column values as Series. So instead use

    for column in df.columns[0:3]:
         pearson_coefficient, p_value = pearsonr(df[column],df['Volume'])