Search code examples
pythonnumpymatplotlibscipypeakutils

How to get value of area under multiple peaks


I have some data from a bioanalyzer which gives me time (x-axis) and absorbance values (y-axis). The time is every .05 seconds and its from 32s to 138 so you can imagine how many data points I have. I've created a graph using plotly and matplotlib, just so that I have more libraries to work with to find a solution, so a solution in either library is ok! What I'm trying to do is make my script find the area under each peak and return my value.

def create_plot(sheet_name):
    sample = book.sheet_by_name(sheet_name)
    data = [[sample.cell_value(r, c) for r in range(sample.nrows)] for c in range(sample.ncols)]
    y = data[2][18:len(data[2]) - 2]
    x = np.arange(32, 138.05, 0.05)
    indices = peakutils.indexes(y, thres=0.35, min_dist=0.1)
    peaks = [y[i] for i in indices]

This snippet gets my Y values, X values and indices of the peaks. Now is there a way to get the area under each curve? Let's say that there are 15 indices.

Here's what the graph looks like:enter image description here


Solution

  • An automated answer

    Given a set of x and y values as well as a set of peaks (the x-coordinates of the peaks), here's how you can automatically find the area under each of the peaks. I'm assuming that x, y, and peaks are all Numpy arrays:

    import numpy as np
    
    # find the minima between each peak
    ixpeak = x.searchsorted(peaks)
    ixmin = np.array([np.argmin(i) for i in np.split(y, ixpeak)])
    ixmin[1:] += ixpeak
    mins = x[ixmin]
    
    # split up the x and y values based on those minima
    xsplit = np.split(x, ixmin[1:-1])
    ysplit = np.split(y, ixmin[1:-1])
    
    # find the areas under each peak
    areas = [np.trapz(ys, xs) for xs,ys in zip(xsplit, ysplit)]
    

    Output:

    enter image description here

    The example data has been set up so that the area under each peak is (more-or-less) guaranteed to be 1.0, so the results in the bottom plot are correct. The green X marks are the locations of the minimum between each two peaks. The part of the curve "belonging" to each peak is determined as the part of the curve in-between the minima adjacent to each peak.

    Complete code

    Here's the complete code I used to generate the example data:

    import scipy as sp
    import scipy.stats
    
    prec = 1e5
    n = 10
    N = 150
    r = np.arange(0, N+1, N//n)
    
    # generate some reasonable fake data
    peaks = np.array([np.random.uniform(s, e) for s,e in zip(r[:-1], r[1:])])
    x = np.linspace(0, N + n, num=int(prec))
    y = np.max([sp.stats.norm.pdf(x, loc=p, scale=.4) for p in peaks], axis=0)
    

    and the code I used to make the plots:

    import matplotlib.pyplot as plt
    
    # plotting stuff
    plt.figure(figsize=(5,7))
    plt.subplots_adjust(hspace=.33)
    plt.subplot(211)
    plt.plot(x, y, label='trace 0')
    plt.plot(peaks, y[ixpeak], '+', c='red', ms=10, label='peaks')
    plt.plot(mins, y[ixmin], 'x', c='green', ms=10, label='mins')
    plt.xlabel('dep')
    plt.ylabel('indep')
    plt.title('Example data')
    plt.ylim(-.1, 1.6)
    plt.legend()
    
    plt.subplot(212)
    plt.bar(np.arange(len(areas)), areas)
    plt.xlabel('Peak number')
    plt.ylabel('Area under peak')
    plt.title('Area under the peaks of trace 0')
    plt.show()