Search code examples
pythonnumpymultidimensional-arrayhistogram

Different considerations when defining 3D bins vs 1D bins for a histogram?


I'm trying to create a 3D histogram of some data I have but I think there must be something wrong with how I'm defining the bins because I'm getting a lot of empty marginal histograms. The following is how I'm creating the histogram and how I'm checking the values --

# Hard boundaries for the cube 
param1_range = [0, 6]
param2_range = [-2, +0.5]
param3_range = [0, 2]

d_param1 = 0.5
param1_bins = np.arange(param1_range[0], param1_range[1], d_param1)
N_param1_bins = len(d_param1) - 1

d_param2 = 0.5
param2_bins = np.arange(param2_range[0], param2_range[1], d_param2)
N_param2_bins = len(d_param2) - 1

d_param3 = 0.25
param3_bins = np.arange(param3_range[0], param3_range[1], d_param3)
N_param3_bins = len(d_param3) - 1


empty_param1, empty_param2, empty_param3 = 0, 0, 0 
cube_full = np.zeros((Nobs, N_param1_bins, N_param2_bins, N_param3_bins))
for i in range(Nobs):
    tmp = np.vstack((param1_data[i,:], param2_data[i,:], param3_data[i,:])).T
    hist, _ = np.histogramdd(tmp[:,:],
                             bins=[param1_bins, param2_bins, param3_bins], 
                             density=True)
    cube_full[i,:,:,:] = hist

    if np.all(hist[:,0,0] == 0.):
              empty_param1 = empty_param1 + 1
    if np.all(hist[0,:,0] == 0.):
              empty_param2 = empty_param2 + 1
    if np.all(hist[0,0,:] == 0.):
              empty_param3 = empty_param3 + 1

print(empty_param1/Nobs, empty_param2/Nobs, empty_param3/Nobs)

About half of the marginal histograms for "param1" are empty and nearly all for "param2" and "param3".

I determined the parameter ranges from looking at my data. I also created 1D histograms for the different parameters using these same ranges and don't get empty bins like when I try to make the 3D histogram. Is there some additional consideration that I'm missing when defining 3D bins versus 1D bins?


Solution

  • I'm interpreting here, but from the comments, I THINK what you want is this:

        sum1 = np.sum(hist, (1,2))
        if np.all(sum1) == 0:
            empty_param1 += 1
        sum2 = np.sum(hist, (0,2))
        sum3 = np.sum(hist, (0,1))
    

    This essentially produces the 1D histograms from the 3D data. If I've still missed the mark, maybe you can provide a example using smaller matrices.

    Example:

    >>> x = np.arange(125).reshape((5,5,5))
    >>> x
    array([[[  0,   1,   2,   3,   4],
            [  5,   6,   7,   8,   9],
            [ 10,  11,  12,  13,  14],
            [ 15,  16,  17,  18,  19],
            [ 20,  21,  22,  23,  24]],
    
           [[ 25,  26,  27,  28,  29],
            [ 30,  31,  32,  33,  34],
            [ 35,  36,  37,  38,  39],
            [ 40,  41,  42,  43,  44],
            [ 45,  46,  47,  48,  49]],
    
           [[ 50,  51,  52,  53,  54],
            [ 55,  56,  57,  58,  59],
            [ 60,  61,  62,  63,  64],
            [ 65,  66,  67,  68,  69],
            [ 70,  71,  72,  73,  74]],
    
           [[ 75,  76,  77,  78,  79],
            [ 80,  81,  82,  83,  84],
            [ 85,  86,  87,  88,  89],
            [ 90,  91,  92,  93,  94],
            [ 95,  96,  97,  98,  99]],
    
           [[100, 101, 102, 103, 104],
            [105, 106, 107, 108, 109],
            [110, 111, 112, 113, 114],
            [115, 116, 117, 118, 119],
            [120, 121, 122, 123, 124]]])
    >>> np.sum(x,(1,2))
    array([ 300,  925, 1550, 2175, 2800])
    >>> np.sum(x,(0,2))
    array([1300, 1425, 1550, 1675, 1800])
    >>> np.sum(x,(0,1))
    array([1500, 1525, 1550, 1575, 1600])
    >>>