Search code examples
pythonmatplotlibbar-charthistogram2d

2D histogram of events is misaligned with 1D bar charts of event probability x and y axes using python and matplotlib


I would like to plot a 2d histogram using matplotlib in order to visualize the influence of two variables on the occurrence of an event.

In my test case, the event is “wish coming true” and the variable x is the number of falling stars and y is the involvement of a fairy godmother. What I would like to do is to plot a 2d histogram of wishes coming true for bins of falling stars and fairy godmothers. Then next to each axis, I would like to show the probability of a wish coming true, event/(event+nonevent), for each bin of falling stars and fairy godmothers (1D bar chart containing probabilities for each histogram bin). The bar chart bins should correspond to and be aligned with the 2d histogram bins. However, there seems to be a slight misalignment between the bar charts and the histogram bins.

For aligning the bar chart correctly, will the settings of the limits of the axis corresponding to the first and last bin edges do the trick ? Once these limits are set, can I feed bin centers into plt.bar() as locations on the axis as opposed to indices ?

My code and the resulting images are as follows :

import numpy as np
import matplotlib.pyplot as plt
from numpy import linspace
import cubehelix

# Create random events and non-events
x_noneve = 3.*np.random.randn(10000) +22.
np.random.seed(seed=41)

y_noneve = np.random.randn(10000)
np.random.seed(seed=45)

x_eve = 3.*np.random.randn(1000) +22.
np.random.seed(seed=33)

y_eve = np.random.randn(1000)

x_all = np.concatenate((x_eve,x_noneve),axis=0)
y_all = np.concatenate((y_eve,y_noneve),axis=0)

# Set up default x and y limits
xlims = [min(x_all),max(x_all)]
ylims = [min(y_all),max(y_all)]

# Set up your x and y labels
xlabel = 'Falling Star'
ylabel = 'Fairy Godmother'

# Define the locations for the axes
left, width = 0.12, 0.55
bottom, height = 0.12, 0.55
bottom_h = left_h = left+width+0.03

# Set up the geometry of the three plots
rect_wishes = [left, bottom, width, height]  # dimensions of wish plot
rect_histx  = [left, bottom_h, width, 0.25]  # dimensions of x-histogram
rect_histy  = [left_h, bottom, 0.25, height] # dimensions of y-histogram

# Set up the size of the figure
fig = plt.figure(1, figsize=(9.5,9))
fig.suptitle('Wishes coming true', fontsize=18, fontweight='bold')

cx1 = cubehelix.cmap(startHue=240,endHue=-300,minSat=1,maxSat=2.5,minLight=.3,maxLight=.8,gamma=.9)

# Make the three plots
axWishes = plt.axes(rect_wishes) # wishes plot
axStarx = plt.axes(rect_histx)   # x bar chart  
axFairy = plt.axes(rect_histy)   # y bar chart 

# Define the number of bins
nxbins = 50
nybins = 50
nbins = 100

xbins = linspace(start = xlims[0], stop = xlims[1], num = nxbins)
ybins = linspace(start = ylims[0], stop = ylims[1], num = nybins)
xcenter = (xbins[0:-1]+xbins[1:])/2.0
ycenter = (ybins[0:-1]+ybins[1:])/2.0

delx    = np.around(xbins[1]-xbins[0], decimals=2,out=None)
dely    = np.around(ybins[1]-ybins[0], decimals=2,out=None)

H, xedges,yedges = np.histogram2d(y_eve,x_eve,bins=(ybins,xbins))
X = xcenter
Y = ycenter
H = np.where(H==0,np.nan,H) # Remove 0's from plot

# Plot the 2D histogram
cax = (axWishes.imshow(H, extent=[xlims[0],xlims[1],ylims[0],ylims[1]],
       interpolation='nearest', origin='lower',aspect="auto",cmap=cx1))

#Plot the axes labels
axWishes.set_xlabel(xlabel,fontsize=14)
axWishes.set_ylabel(ylabel,fontsize=14)

#Set up the plot limits
axWishes.set_xlim(xlims)
axWishes.set_ylim(ylims)

#Set up the probability bins
x_eve_hist, xoutbins    = np.histogram(x_eve, bins=xbins) 
y_eve_hist, youtbins    = np.histogram(y_eve, bins=ybins) 

x_noneve_hist, xoutbins    = np.histogram(x_noneve, bins=xbins) 
y_noneve_hist, youtbins    = np.histogram(y_noneve, bins=ybins) 

probax = [eve/(eve+noneve+0.0) if eve+noneve>0 else 0 for eve,noneve in zip(x_eve_hist,x_noneve_hist)]
probay = [eve/(eve+noneve+0.0) if eve+noneve>0 else 0 for eve,noneve in zip(y_eve_hist,y_noneve_hist)]

probax = probax/np.sum(probax)
probay = probay/np.sum(probay)

probax = np.round(probax*100., decimals=0, out=None)
probay = np.round(probay*100., decimals=0, out=None)

#Plot the bar charts  

#Set up the limits
axStarx.set_xlim( xlims[0], xlims[1])
axFairy.set_ylim( ylims[0], ylims[1])

axStarx.bar(xcenter, probax, align='center', width =delx, color = 'royalblue')
axFairy.barh(ycenter,probay,align='center', height=dely, color = 'mediumorchid')

#Show the plot
plt.show()

resulting image

hex version


Solution

  • While my original code was functional, the limits of the 2D histo and bar chart were not defined using the histogram bins. Thus any changes to the bins resulted in a poorly-aligned graph. To ensure that the limits of the graph always correspond to the limits of the histogram bins, I changed

    cax = (axWishes.imshow(H, extent=[xmin,xmax,ymin,ymax],
           interpolation='nearest', origin='lower',aspect="auto",cmap=cx1))
    

    to

    cax = (axWishes.imshow(H, extent=[xbins[0],xbins[-1],ybins[0],ybins[-1]],
           interpolation='nearest', origin='lower',aspect="auto",cmap=cx1))
    

    and

    axStarx.set_xlim( xlims[0], xlims[1])
    axFairy.set_ylim( ylims[0], ylims[1])
    

    to

    axStarx.set_xlim(axWishes.get_xlim()) 
    axFairy.set_ylim(axWishes.get_ylim())
    

    For information, bar chart can accept either indices or values along the axis as bar locations. When the bars correspond to bins and not categorical variables, it is important to set axis limits and correctly define bar width. These are done automatically with histo. However, if you wish to explore a variable other than the number of members by bin, you must use a bar chart and define the limits by hand.