Search code examples
pythonnumpypearson-correlation

Python (numpy) - correlate two binned plots


My question is how do I correlate my two binned plots and output a Pearson's correlation coefficient?

I'm not sure how to properly extract the binned arrays necessary for the np.corrcoef function. Here's my script:

import numpy as np 
import matplotlib.pyplot as plt

A = np.genfromtxt('data1.txt') 
x1 = A[:,1] 
y1 = A[:,2]

B=np.genfromtxt('data2.txt') 
x2 = B[:,1] 
y2 = B[:,2]

fig = plt.figure() 
plt.subplots_adjust(hspace=0.5) 
plt.subplot(121) 
AA = plt.hexbin(x1,y1,cmap='jet',gridsize=500,vmin=0,vmax=450,mincnt=1) 
plt.axis([-180,180,-180,180]) 
cb = plt.colorbar() 
plt.title('Data1')

plt.subplot(122) 
BB = plt.hexbin(x2,y2,cmap='jet',gridsize=500,vmin=0,vmax=450,mincnt=1) 
plt.axis([-180,180,-180,180]) 
cb = plt.colorbar() 
plt.title('Data 2')

array1 = np.ndarray.flatten(AA)  
array2 = np.ndarray.flatten(BB)

print np.corrcoef(array1,array2)

plt.show()

Solution

  • The answer can be found in the documentation:

    Returns: object

    a PolyCollection instance; use get_array() on this PolyCollection to get the counts in each hexagon.

    Here's a revised version of you code:

    A = np.genfromtxt('data1.txt') 
    x1 = A[:,1] 
    y1 = A[:,2]
    
    B = np.genfromtxt('data2.txt') 
    x2 = B[:,1] 
    y2 = B[:,2]
    
    # make figure and axes
    fig, (ax1, ax2) = plt.subplots(1, 2)
    
    # define common keyword arguments
    hex_params = dict(cmap='jet', gridsize=500, vmin=0, vmax=450, mincnt=1)
    
    # plot and set titles   
    hex1 = ax1.hexbin(x1, y1, **hex_params) 
    hex2 = ax2.hexbin(x2, y2, **hex_params) 
    ax1.set_title('Data 1')
    ax2.set_title('Data 2')
    
    # set axes lims
    [ax.set_xlim(-180, 180) for ax in (ax1, ax2)]
    [ax.set_ylim(-180, 180) for ax in (ax1, ax2)]
    
    # add single colorbar
    fig.subplots_adjust(right=0.8, hspace=0.5)
    cbar_ax = fig.add_axes([0.85, 0.15, 0.05, 0.7])
    fig.colorbar(hex2, cax=cbar_ax)
    
    # get binned data and corr coeff
    binned1 = hex1.get_array()
    binned2 = hex2.get_array()    
    print np.corrcoef(binned1, binned2)
    
    plt.show()
    

    Two comments though: are you sure you want the pearson correlation coefficient? What are you actually trying to show? If you want to show the distributions are the same/different, you might want to use a Kolmogorov-Smirnov test.

    Also don't use jet as a colormap. Jet is bad.