Search code examples
pythonnumpyhistogramhistogram2d

Combine two 2D Datasets in single bi-dimensional histogram matrix with shared scale


I have two datasets d1,d2 filled with 2D Data of different scale.

import numpy as np
d1 = np.random.normal(-30,20,(500,2))
d2 = np.random.normal(-40,10,(500,2))

Furthermore I am able to create separate 100 x 100 2D Histograms (=gray scale images) from each individual dataset.

bins = [100,100]
h1 = np.histogram2d(d1[:,0], d1[:,1], bins)[0]
h2 = np.histogram2d(d2[:,0], d2[:,1], bins)[0]

but with this solution each 2D Histogram gets centered around its own mean and when plotting both histograms on top of each other they appear distributed around the same center which is in fact not true.

What I want to get is a single 100 x 100 x 2 historgam Matrix (comparable to a 2 Channel Image), which takes into account the different scales of the data so the displacement is not lost.


Solution

  • If you pass histogram2d a value bins=[100, 100], you are asking it to automatically calculate 100 bins in each dimension. You can do this yourself, so these two

    bins = [
        np.linspace(x.min(), x.max(), 100),
        np.linspace(y.min(), y.max(), 100)
    ]
    h1 = np.histogram2d(x, y, bins)
    

    and

    bins = [100, 100]
    h1 = np.histogram2d(x, y, bins)
    

    are equivalent.

    Knowing that, we can now calculate the bin range for both arrays combined, and use those

    bins = [
        np.linspace(
            min(d1[:, 0].min(), d2[:, 0].min()),
            max(d1[:, 0].max(), d2[:, 0].max()),
            100
        ),
        np.linspace(
            min(d1[:, 1].min(), d2[:, 1].min()),
            max(d1[:, 1].max(), d2[:, 1].max()),
            100
        )
    ]
    h1 = np.histogram2d(d1[:,0], d1[:,1], bins)
    h2 = np.histogram2d(d2[:,0], d2[:,1], bins)
    

    or stack the two datasets together and simplify the code a bit

    d = np.stack((d1, d2))
    
    bins = [
        np.linspace(d[..., 0].min(), d[..., 0].max(), 100),
        np.linspace(d[..., 1].min(), d[..., 1].max(), 100),
    ]
    
    h1 = np.histogram2d(d[0, :, 0], d[0, :, 1], bins)
    h2 = np.histogram2d(d[1, :, 0], d[1, :, 1], bins)