Search code examples
pythonpandasmatrixcorrelationpearson

Most efficient way to calculate correlation matrix in python


I need to calculate the sales correlation of 5000 products which will results in 5000 by 5000 correlation matrix. I am trying to accomplish this in pandas using df.corr() but it is causing memory issues. Any ideas of more efficient ways to achieve this?


Solution

  • Use np.corrcoef...I was able to process the matrix in under a minute using this.