Search code examples
pythonnumpyhistogramhistogram2d

Access first two dimensions of N dimenional histogram


I generate an N-dimensional histogram via numpy's histogramdd, with dimensions varying between 2-5. I need to plot a 2-dimensional histogram of its first two dimensions, ie:

enter image description here

When there's only two dimensions, I can do that easily as shown below. How can I generalize this code to N dimensions so that it will always plot the first two?

import numpy as np
import matplotlib.pyplot as plt


dims = np.random.randint(2, 5)
print('Dimensions: {}'.format(dims))
N_pts = np.random.randint(100, 500)
print('Points: {}'.format(N_pts))

A_pts, bin_edges = [], []
for _ in range(dims):
    d_min, d_max = np.random.uniform(-1., 1.), np.random.uniform(-1., 1.)
    sample = np.random.uniform(d_min, d_max, N_pts)
    A_pts.append(sample)
    # Define bin edges separately, since they come from somewhere else.
    bin_edges.append(np.histogram(sample, bins='auto')[1])

# Obtain N-dimensional histogram
A_h = np.histogramdd(A_pts, bins=bin_edges)[0]
print(np.shape(A_h))

# Subplots.
fig = plt.figure()
ax0 = fig.add_subplot(1, 2, 1)
ax1 = fig.add_subplot(1, 2, 2)

# 2D histogram x,y ranges
x_extend = [min(A_pts[0]), max(A_pts[0])]
y_extend = [min(A_pts[1]), max(A_pts[1])]

# Scatter plot for A.
ax0.invert_yaxis()
ax0.set_xlim(x_extend)
ax0.set_ylim(y_extend)
ax0.scatter(A_pts[0], A_pts[1], c='b', label='A')
for x_ed in bin_edges[0]:
    # vertical lines
    ax0.axvline(x_ed, linestyle=':', color='k', zorder=1)
for y_ed in bin_edges[1]:
    # horizontal lines
    ax0.axhline(y_ed, linestyle=':', color='k', zorder=1)

# 2D histogram.
# Grid for pcolormesh, using first two dimensions
X, Y = np.meshgrid(bin_edges[0], bin_edges[1])
HA = np.rot90(A_h)
HA = np.flipud(HA)
ax1.pcolormesh(X, Y, HA, cmap=plt.cm.Blues)

# Manipulate axis and ranges.
ax1.invert_yaxis()
ax1.set_xlim(x_extend)
ax1.set_ylim(y_extend)

fig.subplots_adjust(hspace=1)
plt.show()

Solution

  • You must first decide what exactly you mean by "the first two dimensions of the histogram". To make this intuitive imagine you were to start from the 2d one in your example and wanted to reduce that to the first dimension.

    You can see that there are two obvious possibilities.

    • pick a column or
    • sum the along rows to get one summary column

    Of course, the summary solution gives you just the 1d histogram of the original data.

    So for your code.

    If you want to bin the remaining dimensions:

    A_2d = A_h.reshape(A_h.shape[:2] + (-1,)).sum(axis=-1)
    

    If you want to slice:

    A_2d = A_h.reshape(A_h.shape[:2] + (-1,))[..., 0]
    

    The reshape keeps the first two dimensions separate and ravels all the others (the (-1,) instructs reshape to put all that's left into the corresponding axis) The resulting array has the mixed axes as its last dimension. The first line sums along this axis, the second line picks just one slice. You could pick others if you like (change 0 into some other integer).