Search code examples
pythonpython-3.xnumpyaveragescientific-computing

Averaging Data in Bins


I have two lists: 1 is a depth list and the other is a chlorophyll list, which correspond to each other. I want to average chlorophyll data every 0.5 m depth.

chl  = [0.4,0.1,0.04,0.05,0.4,0.2,0.6,0.09,0.23,0.43,0.65,0.22,0.12,0.2,0.33]
depth = [0.1,0.3,0.31,0.44,0.49,1.1,1.145,1.33,1.49,1.53,1.67,1.79,1.87,2.1,2.3]

The depth bins are not always equal in length and do not always start at 0.0 or 0.5 intervals. The chlorophyll data always coordinates with depth data though. The chlorophyll averages also cannot be arranged in ascending order, they need to stay in correct order according to depth. The depth and chlorophyll lists are very long, so I can't do this individually.

How would I make 0.5 m depth bins with averaged chlorophyll data in them?

Goal:

depth = [0.5,1.0,1.5,2.0,2.5]
chlorophyll = [avg1,avg2,avg3,avg4,avg5]

For example:

avg1 = np.mean(0.4,0.1,0.04,0.05,0.4)

Solution

  • One way is to use numpy.digitize to bin your categories.

    Then use a dictionary or list comprehension to calculate results.

    import numpy as np
    
    chl  = np.array([0.4,0.1,0.04,0.05,0.4,0.2,0.6,0.09,0.23,0.43,0.65,0.22,0.12,0.2,0.33])
    depth = np.array([0.1,0.3,0.31,0.44,0.49,1.1,1.145,1.33,1.49,1.53,1.67,1.79,1.87,2.1,2.3])
    
    bins = np.array([0,0.5,1.0,1.5,2.0,2.5])
    
    A = np.vstack((np.digitize(depth, bins), chl)).T
    
    res = {bins[int(i)]: np.mean(A[A[:, 0] == i, 1]) for i in np.unique(A[:, 0])}
    
    # {0.5: 0.198, 1.5: 0.28, 2.0: 0.355, 2.5: 0.265}
    

    Or for the precise format you are after:

    res_lst = [np.mean(A[A[:, 0] == i, 1]) for i in range(len(bins))]
    
    # [nan, 0.198, nan, 0.28, 0.355, 0.265]