Search code examples
pythonnumpymachine-learningscipysignal-processing

Finding relevant points in the given curve


Consider the following Python code which plots a curve and analyzes it to find some points:

%matplotlib inline
import numpy as np
from numpy.polynomial.polynomial import Polynomial
from scipy.interpolate import UnivariateSpline
from scipy.signal import savgol_filter
import scipy.stats
import scipy.optimize
import matplotlib.pyplot as plt

# Create curve X axis.
x = np.arange(10000)
y = np.zeros_like(x, dtype=float)

# Create a straight line for the first 1000 points.
y[:1000] = (10.0,) * 1000
# The remaining points, create an exponential.
y[1000:] = 20 * np.exp(-x[1000:]/1200.0) 
# Add noise.
y += np.random.normal(0, 0.5, x.size)

# Create polynomial.
poly = Polynomial.fit(x, y, 15)
# Create Y values for the polynomial.
py = poly(x)

# Calculate first and second derivatives.
dxdy = np.gradient(py)
dx2dy2 = np.gradient(dxdy)

# Plot original curve and fit on top.
plt.plot(x, y, '', color='black')
plt.plot(x, py, '', color='red')

# Plot first order and second order derivatives.
plt.plot(x, dxdy * 100, '', color='blue')
plt.plot(x, dx2dy2, '', color='green')

enter image description here

I have marked in the image above the points I want to calculate:

I think I can use the first derivative to calculate the first one, which is the start of the transition.

I am not so sure how to calculate the second one, which is the end of the transition, or when the curve becomes flat. I have tried to calculate using the average of the last 100 points in the curve, and finding the first value in the curve below that average, however, it does not seem very reliable.

EDIT1:

While investigating the first derivative, I came up with the following potential solution, which is finding the change of sign to the left and right sides of the peak, I illustrate with an image of the first derivative and signal and fit below:

enter image description here enter image description here


Solution

  • I prefer to work on filtered (smoothed) rather than interpolated data.

    First point I find by:

    • finding maximum of the smoothed data
    • finding first point whose value is 90% of the maximum value
    • going back to find first point whose derivative is >= 0

    Second point I find by

    • finding minimum of the smoothed data
    • finding first point that has value minimum + (maximum - minimum) * 1%
    import numpy as np
    import matplotlib.pyplot as plt
    import scipy.ndimage
    
    # Create curve X axis.
    x = np.arange(10000)
    y = np.zeros_like(x, dtype=float)
    
    # Create a straight line for the first 1000 points.
    y[:1000] = (10.0,) * 1000
    # The remaining points, create an exponential.
    y[1000:] = 20 * np.exp(-x[1000:]/1200.0)
    # Add noise.
    y += np.random.normal(0, 0.5, x.size)
    
    y_smoothed = scipy.ndimage.uniform_filter1d(y, 250, mode='reflect')
    diff_y_smoothed = np.diff(y_smoothed)
    minimum = np.min(y_smoothed)
    maximum = np.max(y_smoothed)
    almost_first_point_idx = np.where(y_smoothed < 0.9 * maximum)[0][0]
    first_point_idx = almost_first_point_idx - np.where(diff_y_smoothed[:almost_first_point_idx][::-1] >= 0)[0][0]
    second_point_idx = np.where(y_smoothed < 0.01 * (maximum - minimum) + minimum)[0][0]
    
    # Plot original curve and fit on top.
    plt.plot(x, y, '', color='black')
    plt.plot(x, y_smoothed, '', color='red')
    plt.axvline(x[first_point_idx])
    plt.axvline(x[second_point_idx])
    plt.show()
    

    enter image description here

    You may wont to tweak a bit those parameters as running window size (hardcoded 250) and 1% used for finding second point.